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PREFACE 


In 1925 the author wrote a hook {Statistical Methods 
Jov Research Workers) with the object of supplying' 
practical experimenters and, incidentally, teachers of 
mathematical statistics, with a connected account of 
ihe applications in laboratory work of some of the 
more recent advances in statistical theory. Some of 
the new methods, such as the analysis of variance, 
were found to be so intimately related with problems 
of experimental design that a considerable part of 
the eighth Chapter was devoted to the technique of 
agricultural experimentation, and these Sections have 
been progressively enlarged with subsequent editions, 
in response to frequent requests for a fuller treatment 
of the subject. The design of experiments is, however, 
too large a subject, and of too great importance to 
the general body ot scientific workers, for any 
incidental treatment to be adequate. A clear grasp 
of simple and standardised statistical procedures 
will, as the reader may satisfy himself, go far to 
elucidate the principles of experimentation ; but 
these procedures are themselves only the means to 
a more important end. Their part is to satisfy the 
requirements of sound and intelligible experimental 
design, and to supply the machinery for unambiguous 
interpretation. To attain a clear grasp of these 
requirements we need to study designs which have 
been widely successful in many fields, and to examine 
their structure in relation to the requirements of valid 
inference. 
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PREFACE 


The examples chosen in this book are aimed at 
illustrating the principles of successful experimenta- 
tion ; first, in their simplest possible applications, 
and later, in regard to the more elaborate structures 
by which the different advantages sought may be 
combined. Statistical discussion has been reduced 
to a minimum, and all the processes required will be 
found more fully exemplified in the previous work 
The reader is, however, advised that the detailed 
working of numerical examples is essential to a. 
thorough grasp, not only ‘of the technique, but of 
the principles by which an experimental procedure 
may be judged to be satisfactory and effective. 


Galton Laboratory, 
July 1935- 
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I AM very sorry, Pyrophilus , that to the many (elsewhere 
enumerated) difficulties which you may meet with, and must 
therefore surmount, in the serious and effectual prosecution 
of experimental philosophy I must add one discouragement 
more, which will perhaps as much surprise as dishearten 
you ; and it is, that besides that you will find (as we 
elsewhere mention) many of the experiments published by 
authors, or related to you by the persons you converse with, 
false and unsuccessful (besides this, I say), you will meet 
with several observations and experiments which, though 
communicated for true by candid authors or undistrusted 
eye-witnesses, or perhaps recommended by your own 
experience may, upon further trial, disappoint your 
expectation, either not at all succeeding constantly or at 
least varying much from what you expected. 

Robert Boyle, 1673, Concerning the 
Unsuccessfulness of Experiments. 
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INTRODUCTION 

1. The Grounds on which Evidence is Disputed 

When any scientific conelusion is supposed to be 
proved on experimental evidence, critics who still 
refuse to accept the conclusion are accustomed to 
take one of two lines of attack. They may claim 
that the interpretation of the experiment is faulty, 
that the results reported are not in fact those which 
should have been expected had the conclusion drawn 
been justified, or that they might equally well have 
arisen had the conclusion drawn been false. Such 
criticisms of interpretation are usually treated as 
falling within the domain of statistics . They are 
often made by professed statisticians against the 
work of others whom they regard as ignorant of or 
incompetent in statistical technique ; and, since the 
interpretation of any considerable body of data is 
likely to involve computations, it is natural enough 
that questions involving the logical implications of 
the results df the arithmetical processes employed, 
should be relegated to the statistician. At least I 
make no complaint of this convention. The statistician 
cannot evade the responsibility for understanding the 
processes he applies or recommends. My immediate 
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point is that the questions involved can be dis- 
sociated from all that is strictly technical in the 
statistician’s craft, and, when so detached, are questions 
only of the right use of human reasoning powers, 
with which all intelligent people, who hope to be 
intelligible, are equally concerned, and on which the 
statistician, as such, speaks with no special authority. 
The statistician cannot excuse himself from the duty 
of getting his head clear on the principles of scientific 
inference, but equally no. other thinking man can 
avoid a like obligation. 

The other type of criticism to which experimental 
results are exposed is that the experiment itself was 
ill designed, or, of course, badly executed. If we 
suppose that the experirrfenter did what he intended 
to do, both of these points come down to the question 
of the design, or the logical structure of the experiment. 
This type of criticism is usually made by what I 
might call a heavyweight authority. Prolonged 
experience, or at least the long possession of a 
scientific reputation, is almost a pre-requisite for 
developing successfully this line of attack. Technical 
details are seldom in evidence. The authoritative 
assertion “ His controls are totally inadequate ” must 
have temporarily discredited many a promising line 
of work ; and such an authoritarian method of 
judgment must surely continue, human fiature being 
what it is, so long as theoretical notions of the 
principles of experimental design are lacking — notions 
just as clear and explicit" as we are accustomed to 
apply to technical details. 
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Now the essential point is that the two sorts of 
criticism I have mentioned come logically to the 
same thing, although they are usually delivered by 
different sorts of people and in very different language. 

If the design of an experiment is faulty, any method 
of interpretation which makes it out to be decisive 
must be faulty too* It is true that there are a great 
many experimental procedures which are well designed 
in that they may lead to decisive conclusions, but on 
other occasions may fail *o do so ; in such cases, if 
decisive conclusions are in fact drawn when they are 
unjustified, we may say that the fault is wholly in 
the interpretation, not in the design. But the fault 
of interpretation, even in these cases, lies in over- 
looking the characteristic features of the design which 
lead to the result being sometimes inconclusive, or 
conclusive on some questions but not on all. To 
understand correctly the one aspect of the problem 
is to understand the other. Statistical procedure 
and experimental design are only two different aspects 
of the same whole, and that whole is the logical 
requirements of the complete process of adding to 
natural knowledge by experimentation. 

2. The Mathematical Attitude towards Induction 

In the foregoing paragraphs the subject-matter of 
this book h&s been regarded from the point of view 
of an experimenter, who wishes to carry out his work 
competently, and having done so wishes to safeguard 
his results, so far as they are validly established, 
from ignorant criticism by different sorts of superior 
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persons. I have assumed, as the experimenter always 
does assume, that it is possible to draw valid inferences 
from the results of experimentation ; that it is possible 
to argue from consequences to causes, from observa- 
tions to hypotheses ; as a statistician would say, 
from a sample to the population from which the 
sample was drawn, or, as a logician might put it, 
from the particular to the general. It is, however, 
certain that many mathematicians, if pressed on the 
point, would say that it is not possible rigorously to 
argue from the particular to the general ; that all 
such arguments must involve some sort of guesswork, 
which they might admit to be plausible guesswork, 
but the rationale of which, they would be unwilling, 
as mathematicians, to discuss. We may at once 
admit that any inference from the particular to the 
general must be attended with some degree of un- 
certainty, but this is not the same as to admit that 
such inference cannot be absolutely rigorous, for the 
nature and degree of the uncertainty may itself be 
capable of rigorous expression. In the theory of 
probability, as developed in its application to games 
of chance, we have the classic example proving this 
possibility. If the gamblers’ apparatus are really 
true or unbiased, the probabilities of the different 
possible events, or combinations of events, can be 
inferred by a rigorous deductive argument, although 
the outcome of any particular game is recognised to 
be uncertain. The mere fact that inductive inferences 
are uncertain cannot, therefore, be accepted as pre- 
cluding perfectly rigorous and unequivocal inference. 
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Naturally, writers on probability have made deter- 
mined efforts to include the problem of inductive 
inference within the ambit of the theory of mathe- 
matical probability, developed in discussing deductive 
problems arising in games of chance. To illustrate 
how much was at one time thought to have been 
achieved in this way, I may quote a very lucid 
statement by Augustus de Morgan, published in 1838, 
in the preface to his essay on probabilities in The 
'Cabinet Cyclopedia. At this period confidence in the 
theory of inverse probability, as it was called, had 
reached, under the influence of Laplace, its highest 
point. Boole’s criticisms had not yet been made, 
nor the more decided rejection of the theory by Venn, 
Chrystal, and later writers. De Morgan is speaking 
of -the advances in the theory which were leading to 
its wider application to practical problems. 

“ There was also another circumstance which 
stood in the way of the first investigators, namely, 
the not having considered, of, at least, not having 
discovered the method of reasoning from the happen- 
ing of an event to the probability of one or another 
cause. The questions treated in the third chapter 
of ‘this work could not therefore be attempted by 
them. Given an hypothesis presenting the necessity 
of one or another out of a certain, and not very 
large, number of consequences, they could deter- 
mine the chance that any given one or other 
of those consequences should arrive ; but given an 
event as having happened, and which might have 
been the consequence of either of several different 
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causes, or explicable by either of several different 
hypotheses, they could not infer the probability with 
which the happening of the event should cause the 
different hypotheses to be viewed. But, just as in 
natural philosophy the selection of an hypothesis by 
means of observed facts is always preliminary to any 
attempt at deductive discovery ; so in the application 
of the notion of probability to the actual affairs of 
life, the process of reasoning from observed events 
to their most probable antecedents must go before the 
direct use of any such antecedent, cause, hypothesis, 
or whatever it may be correctly termed. These 
two obstacles, therefore, the mathematical difficulty, 
and the want of an inverse method, prevented the 
science from extending its views beyond problems 
of that simple nature which games of chance 
present.” 

Referring to the inverse method he later adds : 
“ This was first used by the Rev. T. Bayes, and 
the author though now almost forgotten, deserves 
the most honourable remembrance from all who treat 
the history of this science.” 

3. The Rejection of Inverse Probability 

Whatever may have been true in 1838, it is 
certainly not true to-day that Thomas Bayes is 
almost forgotten. That he seems to have been the 
first man in Europe to have seen the importance 
of developing an exact and quantitative theory of 
inductive reasoning, of arguing from observational 
facts to the theories which might explain them, is 
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surely a sufficient claim to a place in the history of 
science. But he deserves honourable remembrance 
for one fact, also, in addition to those mentioned by 
de Morgan. Having perceived the problem and 
devised an axiom which, if its truth were granted, 
would bring inverse inferences within the scope of 
the theory of mathematical probability, he was 
sufficiently critical of its validity to withhold his entire 
treatise from publication, until his doubts should 
’have been satisfied. In. the event, the work was 
published after his death by his friend, Price, and we 
cannot say what views he ultimately held on the 
subject. 

The discrepancy of opinion among historical 
writers on probability is so great that to mention the 
subject is unavoidable. It would, however, be out 
of place here to argue the point in detail. I will only 
state three considerations which will explain why, 
in the practical applications of the subject, I shall not 
assume the truth of Bayes’ axiom. Two of these 
reasons would, I think, be generally admitted, but 
the first, I can well imagine, might be indignantly 
repudiated in some quarters. The first is this : The 
axiom leads to apparent mathematical contradictions. 
In explaining these contradictions away, advocates 
of inverse probability seem forced to regard mathe- 
matical prfibability, not as an objective quantity 
measured by observed frequencies, but as measuring 
merely psychological tendencies, theorems respecting 
which are useless for scientific purposes. 

My second reason is that it is the nature of an 
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axiom that its truth should be apparent to any 
rational mind which fully apprehends its meaning. 
The axiom of Bayes has certainly been fully appre- 
hended by a good many rational minds, including 
that of its author, without carrying this conviction of 
necessary truth. This, alone, shows that it cannot 
be accepted as the axiomatic basis of a rigorous 
argument. 

My third reason is that inverse probability has 
been only very rarely used in the justification of 
conclusions from experimental facts, although the 
theory has been widely taught, and is widespread in 
the literature of probability. Whatever the reasons 
are which give experimenters confidence that they 
can draw valid conclusions from their results, they 
seem to act just as powerfully whether the experimenter 
has heard of the theory of inverse probability or not. 

4. The Logic of the Laboratory 

In fact, in the course of this book, I propose to 
consider a number of different types of experimenta- 
tion, with especial reference to their logical structure, 
and to show that when the appropriate precautions 
are taken to make this structure complete, entirely 
valid inferences may be drawn from them, without 
using the disputed axiom. If this can be done, we 
shall, in the course of studies having diredtly practical 
aims, have overcome the theoretical difficulty of 
inductive inferences. 

Inductive inference is the only process known to 
us by which essentially new knowledge comes into 
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the world. To make clear the authentic conditions of 
its validity is the kind of contribution to the intellectual 
development of mankind which we should expect 
experimental science would ultimately supply. Men 
have always been capable of some mental processes 
of the kind we call “ learning by experience.” Doubt- 
less this experience was often a very imperfect basis, 
and the reasoning processes used in interpreting it 
were very insecure ; but there must have been in 
these processes a sort of pmbryology of knowledge, 
by which new knowledge was gradually produced. 
Experimental observations are only experience care- 
fully planned in advance, and designed to form a 
secure basis of new knowledge ; that is, they are 
systematically related to the body of knowledge 
.already acquired, and the results are deliberately 
observed, and put on record accurately. As the art 
of experimentation advances the principles should 
become clear by virtue of which this planning and 

designing achieve their purpose. 

It is as well to remember in this connection that 
the principles and methods of even deductive reason- 
ing were probably unknown for several thousand 
years after the establishment of prosperous and 
cultured civilisations. We take a knowledge of these 
principles for granted, only because geometry is 
universally -taught in schools. The method and 
material taught is essentially that of Euclid s text- 
book of the third century b.c., and no one can make 
any progress in that subject without thoroughly 
familiarising his mind with the requirements of a 
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precise deductive argument. Assuming the axioms, 
the body of their logical consequences is built up 
systematically and without ambiguity. Yet it is 
certainly something of an accident historically that 
this particular discipline should have become fashion- 
able in the Greek Universities, and later embodied 
in the curricula of secondary education. It would be 
difficult to overstate how much the liberty of human 
thought has owed to this fortunate circumstance. 
Since Euclid’s time there have been very long period's 
during which the right of unfettered individual 
judgment has been successfully denied in legal, moral, 
and historical questions, but in which it has, none the 
less, survived, so far as purely deductive reasoning 
is concerned, within the shelter of apparently harmless 
mathematical studies. 

The liberation of the human intellect must, how- 
ever, remain incomplete so long as it is free only to 
work out the consequences of a prescribed body of 
dogmatic data, and is denied the access to unsuspected 
truths, which only direct observation can give. The 
development of experimental science has therefore 
done much more than to multiply the technical 
competence of mankind ; and if, in these introductory 
lines, I have seemed to wander far from the immediate 
purpose of this book, it is only because the two 
topics with which we shall be concerned, the arts of 
experimental design and of the valid interpreta- 
tion of experimental results, in so far as they can 
be technically perfected, must constitute the core of 

this claim to the exercise of full intellectual liberty. 

i 
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The chapters which follow are designed to illus- 
trate the principles which are common to all 
experimentation, by means of examples chosen for 
the simplicity with which these principles are brought 
out. Next, to exhibit the principal designs which 
have been found successful in that field of experi- 
mentation, namely -agriculture, in which questions of 
design have been most thoroughly studied, and to 
illustrate their applicability to other fields of work. 
Many of the most useful designs are extremely 
simple, and these deserve the greatest attention, as 
showing in what ways, and on what occasions, greater 
elaboration may be advantageous. The careful reader 
should be able to satisfy himself not only, in detail, 
why some experiments have a complex structure, but 
also how a complex observational record may be 
handled with intelligibility and precision. 

The subject is a new one, and in many ways the 
most that the author can hope is to suggest possible 
lines of attack on the problems with which others 
are confronted. Progress in recent years has been 
rapid, and the few sections devoted to the subject in 
the author’s Statistical Methods for Research Workers , 
first published in 1925, have, with each succeeding 
edition, come to appear more and more inadequate. 
On purely statistical questions the reader must be 
referred to that book. The present volume is an 
attempt to do more thorough justice to the problems 
of planning and foresight with which the experimenter 

is confronted. 


1 
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THE PRINCIPLES OF EXPERIMENTATION, 
ILLUSTRATED BY A PSYCHO-PHYSICAL 
EXPERIMENT 

5. Statement of Experiment 

A lady declares that by tasting a cup of tea made 
with milk she can discriminate whether the milk or 
the tea infusion was first added to the cup. We will 
consider the problem of designing an experiment by 
means of which this assertion can be tested. For 
this purpose let us first lay down a simple form of 
experiment with a view to studying its limitations 
and its characteristics, both those which appear to 
be essential to the experimental method, when well 
developed, and those which are not essential but 
auxiliary. 

Our experiment consists in mixing eight cups of 
tea, four in one way and four in the other, and 
presenting them to the subject for judgment in a 
rahdom order. The subject has been told in advance 
of what the test will consist, namely that she will 
be asked to taste eight cups, that these shall be four 
of each kind’ and that they shall be presented to her 
in a random order, that is in an order not determined 
arbitrarily by human choice, but by the actual 
manipulation of the physical apparatus used in games 
of chance^ cards, dice, roulettes, etc., or, more 
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expeditiously, from a published collection of random 
sampling numbers purporting to give the actual 
results of such manipulation. Her task is to divide 
the 8 cups into two sets of 4, agreeing, if possible, 
with the treatments received. 

6. Interpretation and its Reasoned Basis 

In considering the appropriateness of any proposed 
experimental design, it is always needful to forecast 
all possible results of the* experiment, and to have 
decided without ambiguity what interpretation shall 
be placed upon each one of them. Further, we must 
know by what argument this interpretation is to be 
sustained. In the present instance we may argue 
as follows. There are 70 ways of choosing a group 
of 4 objects out of 8. This may be demonstrated ’by 
an argument familiar to students of “ permutations 
and combinations,” namely, that if we were to choose 
the 4 objects in succession we should have succes- 
sively 8, 7, 6, 5 objects to choose from, and could 
make our succession of choices in 8 X 7 X 6 X 5, or 1680 
ways. But in doing this we have not only chosen 
every possible set of 4, but every possible set in every 
possible order ; and since 4 objects can be arranged 
in order in 4 x 3 x 2 x 1 , or 24 ways, we may find the 
number of possible choices by dividing 1680 by 24. 
The result, 70, is essential to our interpretation of 
the experiment. At best the subject can judge rightly 
with every cup and, knowing that 4 are of each kind, 
this amounts to choosing, out of the 70 sets of 4 which 

might be chosen, that particular one which is correct. 

I 
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A subject without any faculty of discrimination would 
in fact divide the 8 cups correctly into two sets of 
4 in one trial out of 70, or, more properly, with a 
frequency which would approach 1 in 70 more and 
more nearly the more often the test were repeated. 
Evidently this frequency, with which unfailing success 
would be achieved. by a person lacking altogether 
the faculty under test, is calculable from the number 
lof cups used. The odds could be made much higher 
by enlarging the experiment, while, if the experiment 
were much smaller even the greatest possible success 
would give odds so low that the result might, with 
considerable probability, be ascribed to chance. 

7. The Test of Significance 

•It is open to the experimenter to be more or less 
exacting in respect of the smallness of the probability 
he would require before he would be willing to admit 
that his observations have demonstrated a positive 
result. It is obvious that an experiment would be 
useless of which no possible result would satisfy him. 
Thus, if he wishes to ignore results having probabilities 
as high as 1 in 20 — the probabilities being of course 
reckoned from the hypothesis that the phenomenon 
to be demonstrated is in fact absent— then it would 
be useless for him to experiment with only 3 cups of 
tea of each kind. For 3 objects can be chosen out 
of 6 in only 20 ways, and therefore complete success 
in the test would be achieved without sensory dis- 
crimination, i.e. by “ pure chance,” in an average of 
5 trials out of 100. It is usual and convenient for 
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experimenters to take 5 per cent, as a standard level 
of significance, in the sense that they are prepared to 
ignore all results which fail to reach this standard, 
and, by this means, to eliminate from further dis- 
cussion the greater part of the fluctuations which 
chance causes have introduced into their experimental 
results. No such selection can eliminate the whole 
of the possible effects of chance coincidence, and if 
we accept this convenient convention, and agree thati 
an event which would ocqur by chance only once ih 
70 trials is decidedly “ significant,” in the statistical 
sense, we thereby admit that no. isolated experiment, 
however significant in itself, can suffice for the 
experimental demonstration of any natural phe- 
nomenon ; for the “ one chance in a million ” will 
undoubtedly occur, with no less and no more than, its 
appropriate frequency, however surprised we may be 
that it should occur to us. In order to assert that a 
natural phenomenon is experimentally demonstrable 
we need, not an isolated record, but a reliable method 
of procedure. In relation to the test of significance, 
we may say that a phenomenon is experimentally 
demonstrable when we know how to conduct an 
experiment which will rarely fail to give us a statist- 
ically significant result. 

Returning to the possible results of the psycho- 
physical experiment, having decided that* if every cup 
were rightly classified a significant positive result 
would be recorded, or, in other words, that we should 
admit that the lady had made good her claim, what 
should be our conclusion if, for each kind of cup, her 

1 
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judgments are 3 right and 1 wrong ? We may take 
it, in the present discussion, that any error in one 
set of judgments will be compensated by an error 
in the other, since it is known to the subject that 
there are 4 cups of each kind. In enumerating the 
number of ways of choosing 4 things out of 8, such 
that 3 are right and 1 wrong, we may note that the 

3 right may be chosen out of the 4 available in 4 
\yays and, independently of this choice, that the 1 
wrong may be chosen, out* of the 4 available, also in 

4 ways. So that in all we could make a selection of 
the kind supposed in 16 different ways. A similar 
argument shows that, in each kind of judgment, 2 
may be right and 2 wrong in 36 ways, 1 right and 
3 wrong in 16 ways and none right and 4 wrong in 
1 way only. It should be noted that the frequencies 
of these five possible results of the experiment make 
up together, as it is obvious they should, the 70 cases 
out of 70. 

It is obvious, too, that 3 successes to 1 failure, 
although showing a bias, or deviation, in the 
right direction, could not be judged as statistically 
significant evidence of a real sensory discrimination. 
For its frequency of chance occurrence is 16 in 70, 
or more than 20 per cent. Moreover, it is not the 
best possible result, and in judging of its significance 
we must take account not only of its own frequency, 
but also of the frequency of any better result. In 
the present instance “ 3 right and 1 wrong ” occurs 
16 times, and “ 4 right ” occurs once in 70 trials, 
making 17 cases out of 70 as good as or better than 

B 
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that observed. The reason for including cases better 
than that observed becomes obvious on considering 
what our conclusions would have been had the case 
of 3 right and i wrong only i chance, and the case of 
4 right 1 6 chances of occurrence out of 70. The rare 
case of 3 right and 1 wrong could not be judged 
significant merely because it was rare, seeing that a 
higher degree of success would frequently have been 
scored by mere chance. 

8. The Null Hypothesis 

Our examination of the possible results of the 
experiment has therefore led us to a statistical test 
of significance, by which these results are divided' 
into two classes with opposed interpretations. Tests 
of significance are of many different kinds, which 
need not be considered here. Here we are only 
concerned with the fact that the easy calculation in 
permutations which we encountered, and which gave 
us our test of significance, stands for something 
present in every possible experimental arrangement ; 
or, at least, for something required in its interpretation. 
The two classes of results which are distinguished by 
our test of significance are, on the one hand, those 
which show a significant discrepancy from a certain 
hypothesis ; namely, in this case, the hypothesis that 
the judgments given are in no way influenced by the 
order in which the ingredients have been added ; and 
on the other hand, results which show no significant 
discrepancy from this hypothesis. This hypothesis, 
which may or may not be impugned by the result of 
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an experiment, is again characteristic of all experi- 
mentation. Much confusion would often be avoided 
if it were explicitly formulated when the experiment 
is designed. In relation to any experiment we may 
speak of this hypothesis as the “ null hypothesis,” 
and it should be noted that the null hypothesis is 
never proved or established, but is possibly disproved, 
in the course of experimentation. Every experiment 
'may be said to exist only in order to give the facts a 
chance of disproving the mill hypothesis. 

It might be argued that if an experiment can 
disprove the hypothesis that the subject possesses no 
sensory discrimination between two different sorts of 
object, it must therefore be able to prove the opposite 
hypothesis, that she can make some such discrimina- 
tion'. But this last hypothesis, however reasonable 
or true it may be, is ineligible as a null hypothesis 
to be tested by experiment, because it is inexact. If 
it were asserted that the subject would never be 
wrong in her judgments we should again have an 
exact hypothesis, and it is easy to see that this 
hypothesis could be disproved by a single failure, 
but could never be proved by any finite amount of 
experimentation. It is evident that the null hypothesis 
must be exact, that is free from vagueness and 
ambiguity, because it must supply the basis of the 
“ problem of distribution,” of which the test of 
significance is the solution. A null hypothesis may, 
indeed, contain arbitrary elements, and in more 
complicated cases often does so. As, for example, 
if it should assert that the death-rates of two groups 
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of. animals are equal, without specifying what these 
death-rates actually are. In such cases it is evidently 
the equality rather than any particular values of the 
death-rates that the experiment is designed to test, 
and possibly to disprove. The “ error,” so called, 
of accepting the null hypothesis “ when it is false,” 
is thus always ill-defined both in magnitude and 
frequency. We may, however, choose any null 
hypothesis we please, provided it is exact. 

9 . Randomisation ; the Physical Basis of the Validity 

of the Test 

We have spoken of the experiment as testing a 
certain null hypothesis, namely, in this case, that 
the subject possesses no sensory discrimination what- 
ever of the kind claimed ; we have, too, assigned as 
appropriate to this hypothesis a certain frequency 
distribution of occurrences, based on the equal 
frequency of the 70 possible ways of assigning 8 
objects to two classes of 4 each ; in other words, the 
frequency distribution appropriate to a classification 
by pure chance. We have now to examine the 
physical conditions of the experimental technique 
needed to justify the assumption that, if discrimination 
of the kind under test is absent, the result of the 
experiment will be wholly governed by the laws of 
chance. It is easy to see that it might well be other- 
wise. If all those cups made with the milk first had 
sugar added, while those made with the tea first had 
none, a very obvious difference in flavour would 
have been introduced which might well ensure that 
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all those made with sugar should be classed alike. 
These groups might either be classified all right or 
all wrong, but in such a case the frequency of the 
critical event in which all cups are classified correctly 
would not be i in 70, but 35 in 70 trials, and the test 
of significance would be wholly vitiated. Errors 
equivalent in principle to this are very frequently 
incorporated in otherwise well-designed experiments. 

It is no sufficient remedy to insist that “ all the 
cups must be exactly alike*” in every respect except 
that to be tested. For this is a totally impossible 
requirement in our example, and equally in all other 
forms of experimentation. In practice it is probable 
• that the cups will differ perceptibly in the thickness or 
smoothness of their material, that the quantities of 
milk added to the different cups will not be exactly 
equal, that the strength of the infusion of tea may 
change between pouring the first and the last cup, 
and that the temperature at which the tea is tasted 
will change during the course of the experiment. 
These are only examples of the differences probably 
present ; it would be impossible to present an 
exhaustive list of such possible differences appropriate 
to any one kind of experiment, because the un- 
controlled causes which may influence the result are 
always strictly innumerable. When any such cause 
is named, it is usually perceived that, by increased 
labour and expense, it could be largely eliminated. 
Too frequently it is assumed that such refinements 
constitute improvements to the experiment. Our 
view, which will be much more fully exemplified in 
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a later section, is that it is an essential characteristic 
of experimentation that it is carried out with limited 
resources, and an’ essential part of the subject of 
experimental design to ascertain how these should 
be best applied ; or, in particular, to which causes of 
disturbance care should be given, and which ought 
to be deliberately ignored. To ascertain, too, for 
those which are not to be ignored, to what extent it 
is worth while to take the trouble to diminish their 
magnitude. For our present purpose, however, it is 
only necessary to recognise that, whatever degree of 
care and experimental skill is expended in equalising 
the conditions, other than the one under test, which 
are liable to affect the result, this equalisation must 
always be to a greater or less extent incomplete, and 
in many important practical cases will certainly be 
grossly defective. We are concerned, therefore, that 
this inequality, whether it be great or small, shall 
not impugn the exactitude of the frequency dis- 
tribution, on the basis of which the result of the 
experiment is to be appraised. 

10. The Effectiveness of Randomisation 

The element in the experimental procedure which 
contains the essential safeguard, is that the two 
modifications of the test beverage are to be prepared 
“ in random order.’' This, in fact, is the only point 
in the experimental procedure in which the laws of 
chance, which are to be in exclusive control of our 
frequency distribution, have been explicitly intro- 
duced. The phrase “ random order ” itself, however, 
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must be regarded as an incomplete instruction, 
standing as a kind of shorthand symbol for the full 
procedure of randomisation, by which the validity 
of the test of significance may be guaranteed against 
corruption by the causes of disturbance which have 
not been eliminated. To demonstrate that, with 
satisfactory randomisation, its validity is, indeed, 
wholly unimpaired, let us imagine all causes of 
disturbance — the strength of the infusion, the quantity 
of milk, the temperature at which it is tasted, etc. — 
to be predetermined for each cup ; then since these, 
on the null hypothesis, are the only causes influencing 
classification, we may say that the probabilities of 
each of the 70 possible choices or classifications which 
the subject can make are also predetermined. If, 
now, after the disturbing causes are fixed, we assign, 
strictly at random, 4 out of the 8 cups to each of our 
experimental treatments, then every set of 4, whatever 
its probability of being so classified, will certainly have 
a probability of exactly 1 in 7° °f being the 4, for 
example, to which the milk is added first. However 
important the causes of disturbance may be, even if 
they were to make it certain that one particular set 
of 4 should receive this classification, the probability 
that the 4 so classified and the 4 which ought to have 
been so classified should be the same, must be 
rigorously in accordance with our test of significance. 

It is apparent, therefore, that the random choice 
of the objects to be treated in different ways would be 
a complete guarantee of the validity of the test of 
significance, if these treatments were the last in time 



24 THE PRINCIPLES OF EXPERIMENTATION 


of the stages in the physical history of the objects 
which might affect their experimental reaction. The 
circumstance that the experimental treatments cannot 
always be applied last, and may come relatively early 
in their history, causes no practical inconvenience ; 
for, subsequent causes of differentiation, if under the 
experimenter’s control as, for example, the choice of 
different pipettes to be used with different flasks, 
can either be predetermined before the treatments 
have been randomised, of, if this has not been done, 
can be randomised on their own account ; and other 
causes of differentiation will be either (a) conse- 
quences of differences already randomised, or ( 3 ) 
natural consequences of the difference in treatment 
to be tested, of which on the null hypothesis there 
will be none, by definition, or (t) effects supervening 
by chance independently from the treatments applied. 
Apart, therefore, from the avoidable error of the 
experimenter himself introducing with his test treat- 
ments, or subsequently, other differences in treatment, 
the effects of which the experiment is not intended to 
study, it may be said that the simple precaution of 
randomisation will suffice to guarantee the validity 
of the test of significance, by which the result of the 
experiment is to be judged. 

11. The Sensitiveness of an Experiment. Effects of 
Enlargement and Repetition 

A probable objection, which the subject might 
well make to the experiment so far described, is that 
only if every cup is classified correctly will she be 
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judged successful. A single mistake will reduce her 
performance below the level of significance. Her 
claim, however, might be, not that she could draw 
the distinction with invariable certainty, but that, 
though sometimes mistaken, she would be right more 
often than not ; and that the experiment should 
be enlarged sufficiently, or repeated sufficiently 
often, for her to be able to demonstrate the 
predominance of correct classifications in spite of 
occasional errors. 

An extension of the calculation upon which the 
test of significance was based shows that an experi- 
ment with 12 cups, six of each kind, gives, on the 
null hypothesis, 1 chance in 924 for complete success, 
and 36 chances for 5 of each kind classified right and 
1 wrong. As 37 is less than a twentieth of 924, such 
a test could be counted as significant, although a pair 
of cups have been wrongly classified ; and it is easy 
to verify that, using larger numbers still, a significant 
result could be obtained with a still higher proportion 
of errors. By increasing the size of the experiment, 
we. can render it more sensitive, meaning by this 
that it will allow of the detection of a lower degree 
of sensory discrimination, or, in other words, of a 
quantitatively smaller departure from the null hypo- 
thesis. Since in every case the experiment is capable 
of disproving, but never of proving this hypothesis, 
we may say that the value of the experiment is 
increased whenever it permits the null hypothesis to 
be more readily disproved. 

The same result could be achieved by repeating 
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the experiment, as originally designed, upon a 
number of different occasions, counting as a success 
all those occasions on which 8 cups are correctly 
classified. The chance of success on each occasion 
being i in 70, a simple application of the theory of 
probability shows that 2 or more successes in 10 trials 
would occur, by chance, with a frequency below the 
standard chosen for testing significance ; so that 
the sensory discrimination would be demonstrated, 
although, in 8 attempts 6ut of 10, the subject made 
one or more mistakes. This procedure may be 
regarded as merely a second way of enlarging the 
experiment and, thereby, increasing its sensitiveness, 
since in our final calculation we take account of the' 
aggregate of the entire series of results, whether 
successful or unsuccessful. It would clearly be 
illegitimate, and would rob our calculation of its 
basis, if the unsuccessful results were not all brought 
into the account. 

12. Qualitative Methods of increasing Sensitiveness 

Instead of enlarging the experiment we may 
attempt to increase its sensitiveness by qualitative 
improvements ; and these are, generally speaking, of 
two kinds : ( a ) the reorganisation of its structure, 

and ( b ) refinements of technique. To illustrate a 
change of structure we might consider that, instead 
of fixing in advance that 4 cups should be of each 
kind, determining by a random process how the 
subdivision should be effected, we might have allowed 
the treatment of each cup to be determined independ- 
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ently by chance, as by the toss of a coin, so that each 
treatment has an equal chance of being chosen. The 
chance of classifying correctly 8 cups randomised in 
this way, without the aid of sensory discrimination, 
is 1 in 2 8 , or 1 in 256 chances, and there are only 
8 chances of classifying 7 right and 1 wrong ; conse- 
quently the sensitiveness of the experiment has been 
increased, while still using only 8 cups, and it is 
pqssible to score a significant success, even if one is 
classified wrongly. In matiy types of experiment, 
therefore, the suggested change in structure would be 
evidently advantageous. For the special require- 
ments of a psycho-physical experiment, however, we 
should probably prefer to forego this advantage, 
since it would occasionally occur that all the cups 
would be treated alike, and this, besides bewildering 
the subject by an unexpected occurrence, would deny 
her the real advantage of judging by comparison. 

Another possible alteration to the structure of the 
experiment, which would, however, decrease its 
sensitiveness, would be to present determined, but 
unequal, numbers of the two treatments. Thus we 
might arrange that 5 cups should be of the one kind 
and 3 of the other, choosing them properly by chance, 
and informing the subject how many of each to 
expect. But since the number of ways of choosing 
3 things out of 8 is only 56, there is now, on the null 
hypothesis, a probability of a completely correct 
classification of 1 in 56. It appears in fact that we 
cannot by these means do better than by presenting 
the two treatments in equal numbers, and the choice 
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of this equality is now seen to be justified by its 
giving to the experiment its maximal sensitiveness. 

With respect to the refinements of technique, we 
have seen above that these contribute nothing to 
the validity of the experiment, and of the test of 
significance by which we determine its result. They 
may, however, be important, and even essential, in 
permitting the phenomenon under test to manifest 
itself. Though the test of significance remains valid, 
it may be that without ‘special precautions even a 
definite sensory discrimination would have little 
chance of scoring a significant success. If some 
cups were made with India and some with China 
tea, even though the treatments were properly 
randomised, the subject might not be able to dis- 
criminate the relatively small difference in flaVour 
under investigation, when it was confused with the 
greater differences between leaves of different origin. 
Obviously, a similar difficulty could be introduced 
by using in some cups raw milk and in others boiled, 
or even condensed milk, or by adding sugar in unequal 
quantities. The subject has a right to claim, and it 
is in the interests of the sensitiveness of the experi- 
ment, that gross differences of these kinds should be 
excluded, and that the cups should, not as far as 
possible, but as far as is practically convenient, be 
made alike in all respects except that under test. 

How far such experimental refinements should be 
carried is entirely a matter of judgment, based on 
experience. The validity of the experiment is not 
affected by them. Their sole purpose is to increase 
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its sensitiveness, and this object can usually be 
achieved in many other ways, and particularly by 
increasing the size of the experiment. If, therefore, 
it is decided that the sensitiveness of the experiment 
should be increased, the experimenter has the choice 
between different methods of obtaining equivalent 
results ; and will be wise to choose whichever method 
is easiest to him, irrespective of the fact that previous 
experimenters may have tried, and recommended as 
very important, or even esSential, various ingenious 
and troublesome precautions. 
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A HISTORICAL EXPERIMENT ON 
GROWTH RATE 

13. We have illustrated a psycho-physical experiment, 
the result of which depends upon judgments, scored 
“right’' or “wrong,” and may be appropriately 
interpreted by the method of the classical theory of 
probability. This method rests on the enumeration 
of the frequencies with which different combinations 
of right or wrong judgments will occur, on the 
hypothesis to be tested. We may now illustrate an 
experiment in which the results are expressed in 
quantitative measures, and which is appropriately 
interpreted by means of the theory of errors. 

In the introductory remarks to his book on “The 
effects of cross and self-fertilisation in the vegetable 
kingdom,” Charles Darwin gives an account of the 
considerations which guided him in the design of his 
experiments and in the presentation of his data, which 
will serve well to illustrate the principles on which 
biological experiments may be made conclusive. The 
passage is of especial interest in illustrating the 
extremely crude and unsatisfactory statistical methods 
available at the time, and the manner in which 
careful attention to commonsense considerations led 
to the adoption of an experimental design, in itself 
greatly superior to these methods of interpretation. 
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DARWIN’S DATA 3 i 

14. Darwin’s Discussion of the Data 

I long doubted whether it was worth while to 
give the measurements of each separate plant, but 
have decided to do so, in order that it may be seen 
that the superiority of the crossed plants over the 
self-fertilised, does not commonly depend on the 
presence of two or tj^ree extra fine plants on the one 
side, or of a few very poor plants on the other side. 
Although several observers have insisted in general 
terms on the offspring from intercrossed varieties 
being superior to either parent-form, no precise 
measurements have been given ; and I have met with 
no observations on the effects of crossing anti self- 
fertilising the individuals of the same variety. More- 
over^ experiments of this kind require so much time 
— mine having been continued during eleven years — 
that they are not likely soon to be repeated. 

“ As only a moderate number of crossed and 
self-fertilised plants were measured, it was of great 
importance to me to learn how far the averages were 
trustworthy. I therefore asked Mr Galton, who has 
had much experience in statistical researches, to 
examine some of my tables of measurements, seven 
in number, namely those of Ipomcea, ■ Digitalis, 
Reseda lutea, V tola, Limnanthes, Petunia, and Zea. 
I may premise; that if we took by chance a dozen or 
score of men belonging to two nations and measured 
them, it would I presume be very rash to form any 
judgment from such small numbers on their average 
heights. But the case is somewhat different with my 
crossed and self-fertilised plants, as they were of 
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exactly the same age, were subjected from first to 
last to the same conditions, and were descended from 
the same parents. When only from two to six pairs 
of plants were measured, the results are manifestly 
of little or no value, except in so far as they confirm 
and are confirmed by experiments made on a larger 
scale with other species. I will now give the report 
on the seven tables of measurements, which Mr Galton 
has had the great kindness to draw up for me.” 

15. Galton’s Method of Interpretation 

“ I have examined the measurements of the plants with 
care, and by many statistical methods, to find out how far 
the means of the several sets represent constant realities, 
such as would come out the same so long as the general 
conditions of growth remained unaltered. The principal 
methods that were adopted are easily explained by selecting 
one of the shorter series of plants, say of Zea mays, for an 
example. 

“ The observations as I received them are shown in 
columns II. and III., where they certainly have no prima 
Jacie appearance of regularity. But as soon as we arrange 
them in the order of their magnitudes, as in columns IV. 
and V., the case is materially altered. We now see, with 
few exceptions, that the largest plant on the crossed side in 
each pot exceeds the largest plant on the self-fertilised side, 
that the second exceeds the second, the third the third, and 
so on. Out of the fifteen cases in the table, there are only 
two exceptions to this rule.* We may therefore confidently 
affirm that a crossed series will always be found to exceed 
a self-fertilised series, within the range of the conditions 
under which the present experiment has been made. 

“ Next as regards the numerical estimate of this excess. 

* Galton evidently did not notice that this is true also before 
rearrangement. 
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Zea mays (young plants) 
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TABLE 2 


Pot. 

Crossed. 

Sclf-fert. 

Difference. 

I. . 

i8f 

195 

+°£ 

tl. 

20| 

19 

— ii 

III. 

2lJ 

i6£ 

-4f 

IV. 

192 

l6 

-•jfi 


The mean values of the several groups are so discordant, 
as is shown in the table just given, that a fairly precise 
numerical estimate seems impossible. But the consideration 
arises, whether the difference between pot and pot may not 
be of much the same order of importance as that of the 
other conditions upon which the growth of the plants has 
been modified. If so, and only on that condition, it would 
follow that when all the measurements, either of the crossed 
or the self-fertilised plants, were combined into a single 
series, that series would be statistically regular. The 
experiment is tried in columns VII. and VIII., where the 
regularity is abundantly clear, and justifies us in considering 
its mean as perfectly reliable. I have protracted these 
measurements, and revised them in the usual way, by drawing 
a curve through them with a free hand, but the revision 
barely modifies the means derived from the original observa- 
tions. In the present, and in nearly all the other cases, the 
difference between the original and revised means is under 
2 per cent, of their value. It is a very remarkable coincidence 
that in the seven kinds of plants, whose measurements I 
have examined, the ratio between the heights of the crossed 
and of the self-fertilised ranges in five cases within very 
narrow limits. In Zea mays it is as ioo to 84, and in the 
others it ranges between 100 to 76 and 100 to 86. 

“ The determination of the variability (measured by what 
is technically called the ‘ probable error ’) is a problem of 
more delicacy than that of determining the means, and I 
doubt, after making many trials, whether it is possible to 
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derive useful conclusions from these few observations. We 
ought to have measurements of at least fifty plants in each 
case, in order to be in a position to deduce fair results. . . 

“ Mr Galton sent me at the same time graphical 
representations which he had made of the measure- 
ments, and they evidently form fairly regular curves. 
He appends the words ‘ very good ’ to those of Zca 
and Limnanthes. He also calculated the average 
height of the crossed and self-fertilised plants in the 
seven tables by a more correct method than that 
followed by me, namely, by including the heights, 
as estimated in accordance with statistical rules, of a 
few plants which died before they were measured ; 
whereas I merely added up the heights of the sur- 
vivors, and divided the sum by their number. The 
difference in our results is in one way highly satis- 
factory, for the average heights of the self-fertilised 
plants, as deduced by Mr Galton, is less than mine in 
all the cases excepting one, in which our averages are 
the same ; and this shows that I have by no means 
exaggerated the superiority of the crossed over the 
self-fertilised plants.” 

16. Pairing and Grouping 

It is seen that the method of comparison adopted 
by Darwin is that of pitting each self-fertilised plant 
against a cross-fertilised one, in conditions made as 
equal as possible. The pairs so chosen for comparison 
had germinated at the same time, and the soil con- 
ditions in which they grew were largely equalised by 
planting in the same pot. Necessarily they were not of 
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the same parentage, as it would be difficult in maize to 
self-fertilise two plants, at the same time as raising a 
cross-fertilised progeny from the pair. However, the 
parents were presumably grown from the same batch 
of seed. The evident object of these precautions is to 
increase the sensitiveness of the experiment, by making 
such differences in growth rate as were to be observed 
as little as possible dependent from environmental 
circumstances, and as much as possible, therefore, 
from intrinsic differences due to their mode of origin. 

The method of pairing, which is much used in 
modern biological work, illustrates well the way in 
which an appropriate experimental design is able to 
reconcile two desiderata, which sometimes appear to 
be in conflict. On the one hand we require the 
utmost uniformity in the biological material, which 
is the subject of experiment, in order to increase the 
sensitiveness of each individual observation ; and, 
on the other, we require to multiply the observations 
so as to demonstrate as far as possible the reliability 
and consistency of the results. Thus an experimenter 
with field crops may desire to replicate his experiments 
upon a large number of plots, but be deterred by the 
consideration that his facilities allow him to sow only 
a limited area on the same day. An experimenter 
with small mammals may have only a, limited supply 
of an inbred and highly uniform stock, which he 
believes to be particularly desirable for experimental 
purposes. Or, he may desire to carry out his experi- 
ments on members of the same litter, and feel that 
his experiment is limited by the size of the largest 
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litter he can obtain. It has, indeed, frequently been 
argued that, beyond a certain moderate degree, 
further replication can give no further increase in 
precision, owing to the increasing heterogeneity with 
which, it is thought, it must be accompanied. In all 
these cases, however, and in the many analogous cases 
which constantly arise, there is no real dilemma. 
Uniformity is only requisite between the objects 
whose response is to be contrasted (that is, objects 
treated differently). It is not requisite that all the 
parallel plots under the same treatment shall be 
sown on the same day, but only that eacli such plot 
shall be sown as far as possible simultaneously with 
the differently treated plot or plots with which it is to 
be compared. If, therefore, only two kinds of treat- 
ments are under examination, pairs of plots may be 
chosen, one plot for each treatment ; and the precision 
of the experiment will be given its highest value if 
the members of each pair are treated closely alike, 
but will gain nothing from similarity of treatment 
applied to different pairs, nor lose anything if the 
conditions in these are somewhat varied. In the 
same way, if the numbers of animals available from 
any inbred line are too few for adequate replication, 
the experimental contrasts in treatments may be 
applied to pajrs of animals from different inbred 
lines, so long as each pair belongs to the same line. 
In these two cases it is evident that the principle of 
combining similarity between controls to be com- 
pared, with diversity between parallels, may be 
extended to cases where three or more treatments 
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are under investigation. The requirement that 
animals to be contrasted must come from the same 
litter limits, not the amount of replication, but the 
number of different treatments that can be so tested. 
Thus we might test three, but not so easily four or 
five treatments, if it were necessary that each set of 
animals' must be of the same sex and litter. Paucity 
of homogeneous material limits the number of different 
treatments in an experiment, not the number . of 
replications. It may cramp the scope and compre- 
hensiveness of an experimental enquiry, but sets no 
limit to its possible precision. 

17. “ Student’s ” t Test * 

Owing to the historical accident that the theory 
of errors, by which quantitative data are to be 
interpreted, was developed without reference to experi- 
mental methods, the vital principle has often been 
overlooked that the actual and physical conduct of 
an experiment must govern the statistical procedure 
of its interpretation. In using the theory of errors 
we rely for our conclusion upon one or more estimates 
of error, derived from the data, and appropriate to 
the one or more sets of comparisons which we wish 
to make. Whether these estimates are valid, for the 
purpose for which we intend them, depends on what 

* A full account of this test in more varied applications, and the 
tables for its use will be found in Statistical Methods for Research 
Workers. Its originator, who published anonymously under the 
pseudonym “ Student,” possesses the remarkable distinction that, 
without being a professed mathematician, he made early in life this 
revolutionary refinement of the classical theory of errors. 
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has been actually done. It is possible, and indeed it 
is all too frequent, for an experiment to be so con- 
ducted that no valid estimate of error is available. 
In such a case the experiment cannot be said, strictly, 
to be capable of proving anything. Perhaps it should 
not, in this case, be called an experiment at all, but 
be added merely to the body of experience on which, 
for lack of anything better, we may have to base our 
opinions. All that we need to emphasise immediately 
is that, if an experiment does allow us to calculate a 
valid estimate of error, its structure must completely 
determine the statistical procedure by which this 
estimate is to be calculated. If this were not so, no 
interpretation of the data could ever be unambiguous ; 
for we could never be sure that some other equally 
valid method of interpretation would not lead to a 
different result. 

The object of the experiment is to determine 
whether the difference in origin between inbred and 
cross-bred plants influences their growth rate, as 
measured by height at a given date ; in other words, 
if the numbers of the two sorts of plants were to be 
increased indefinitely, our object is to determine 
whether the average heights, to which these two 
aggregates of plants will tend, are equal or unequal. 
The most general statement of our null hypothesis is 
therefore that the limits to which these two averages 
tend are equal. The theory of errors enables us to 
test a somewhat more limited hypothesis, which, by 
wide experience, has been found to be appropriate to 
the metrical characters of experimental material in 
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biology. The disturbing causes which introduce dis- 
crepancies in the means of measurements of similar 
material are found to produce quantitative effects 
which conform satisfactorily to a theoretical distribu- 
tion known as the normal law of frequency of error. 
It is this circumstance that makes it appropriate to 
choose, as the null hypothesis to be tested, one for 
which an exact statistical criterion is available, namely 
that the two groups of measurements are samples 
drawn from the same normal population. On the 
basis of this hypothesis we may proceed to compare 
the average difference in height, between the cross- 
fertilised and the self-fertilised plants, with such 
differences as might be expected between these 
averages, in view of the observed discrepancies 
between the heights of plants of like origin. 

We must now see how the adoption of the method 
of pairing determines the details of the arithmetical 
procedure, so as to lead to an unequivocal interpreta- 
tion. The pairing procedure, as indeed was its 
purpose, has equalised any differences in soil condi- 
tions, illumination, air-currents, etc., in which the 
several pairs of individuals may differ. Such 
differences having been eliminated from the experi- 
mental comparisons, and contributing nothing to the 
real errors of our experiment, must, for this reason, 
be eliminated likewise from our estimate of error, 
upon which we are to judge what differences between 
the means are compatible with the null hypothesis, 
and what differences are so great as to be incompatible 
with it. We are therefore not concerned with the 
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differences in height among plants of like origin, but 
only with differences in height between members of 
the same pair, and with the discrepancies among 
these differences observed in different pairs. Our first 
step, therefore, will be to subtract from the height of 
each cross-fertilised plant the height of the self- 
fertilised plant belonging to the same pair. The 
differences are shown below in eighths of an inch. 
Wjth respect to these differences our null hypothesis 
asserts that they are normally distributed about a 
mean value at zero, and we have to test whether 
our 15 observed differences are compatible with the 
supposition that they are a sample from such a 
population. 

TABLE 3 

Differences in eighths of an inch between cross ■ and 
self-fertilised plants of the same pair 


49 

23 

5 <> 

-67 

28 

24 

8 

4 i 

75 

16 

14 

60 

6 

29 

-48 


The calculations needed to make a rigorous test 
of the null hypothesis stated above, involve no more 
than the sum, and the sum of the squares, of these 
numbers. The sum is 314, and, since there are 15 


plants, the mean difference is 20*^ in favour of the 

cross-fertilised plants. The sum of the squares is 
26,518, and from this is deducted the product of 
the total and the mean, or 6573, leaving 19,945 
for the sum of squares of deviations from the 
mean, representing discrepancies among the differences 
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observed in the 15 pairs. The algebraic fact here 
used is that 

S(x— x ) 2 = S(* 2 ) — xS(x) 

where S stands for summation over the sample, and 
x for the mean value of the observed differences, 

We may make from this measure of the dis- 
crepancies an estimate of a quantity known as the 
variance of an individual difference, by dividing by 
14, one less than the number of pairs observed. 
Equally, and what is more immediately required, we 
may make an estimate of the variance of the mean 
of 1 5 such pairs, by dividing again by 15, a process 
which yields 94-976 as the estimate. The square root 
of the variance is known as the standard error, and it 
is by the ratio which our observed mean difference 
bears to its standard error that we shall judge of its 
significance. Dividing our difference, 20-933, by its 
standard error 9-746, we find this ratio (which is 
usually denoted by t) to be 2-148. 

The object of these calculations has been to 
obtain from the data a quantity measuring the 
average difference in height between the cross- 
fertilised and the self-fertilised plants, in terms of the 
observed discrepancies among these differences ; and 
which, moreover, shall be distributed in a known 
manner when the null hypothesis true. The 
mathematical distribution for our present problem 
was discovered by “ Student ” in 1908, and depends 
only upon the number of independent comparisons 
(or the number of degrees of freedom) available for 
calculating the estimate of error. With 15 observed 
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differences we have among them 14 independent dis- 
crepancies, and our degrees of freedom are 14. The 
available tables of the distribution of t show that for 
14 degrees of freedom the value 2-145 * s exceeded 
by chance, either in the positive or negative direction, 
in exactly 5 per cent, of random trials. The observed 
value of t, 2-148, thus just exceeds the 5 per cent, 
point, and the experimental result may be judged 
significant, though barely so. 

18. Fallacious Use of Statistics 

We may now see that Darwin’s judgment was 
perfectly sound, in judging that it was of importance 
to learn how far the averages were trustworthy, and 
that this could be done by a statistical examination 
of the tables of measurements of individual plants, 
though not of their averages. The example chosen, 
in fact, falls just on the border-line between those 
results which can suffice by themselves to establish 
the point at issue, and those which are of little value 
except in so far as they confirm or are confirmed by 
other experiments of a like nature. In particular, it 
is to be noted that Darwin recognised that the 
reliability of the result must be judged by the con- 
sistency of the superiority of the crossed plants over 
the self-fertilised, and not only on the difference of 
the averages, which might depend, as he says, on the 
presence of two or three extra-fine plants on the one 
side, or of a few very poor plants on the other side ; 
and that therefore the presentation of the experi- 
mental evidence depended essentially on giving the 
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measurements of each independent plant, and could 
not be assessed from the mere averages. 

It may be noted also that Galton’s scepticism of 
the value of the probable error, deduced from only 
15 pairs of observations, though, as it turned out, 
somewhat excessive, was undoubtedly right in 
principle. The standard error (of which the probable 
error is only a conventional fraction) can only be 
estimated with considerable uncertainty from .so 
small a sample, and, prior to “ Student’s ” solution 
of the problem, it was by no means clear to what 
extent this uncertainty would invalidate the test of 
significance. From “ Student’s ” work it is now 
known that the cause for anxiety was not so great 
as it might have seemed. Had the standard error 
been known with certainty, or derived from an 
effectively infinite number of observations, the 
5 per cent, value of t would have been 1-960. When 
our estimate is based upon only 15 differences, the 
5 per cent, value, as we have seen, is 2-145, or less 
than 10 per cent, greater. Even using the inexact 
theory available at the time, a calculation of the 
probable error would have provided a valuable guide 
to the interpretation of the results. 

19. Manipulation of the Data 

A much more serious fallacy appears to be involved 
in Galton’s assumption that the value of the data, 
for the purpose for which they were intended, could 
be increased by re-arranging the comparisons. 
Modern statisticians are familiar with the notions 
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that any finite body of data contains only a limited 
amount of information, on any point under examina- 
tion ; that this limit is set by the nature of the data 
themselves, and cannot be increased by any amount 
of ingenuity expended in their statistical examina- 
tion : that the statistician’s task, in fact, is limited 
to the extraction of the whole of the available 
information on any particular issue. If the results 
of .an experiment, as obtained, are in fact irregular, 
this evidently detracts from their value ; and the 
statistician is not elucidating but falsifying the facts, 
who re-arranges them so as to give an artificial 
appearance of regularity. 

In re-arranging the results of Darwin’s experiment 
it appears that Gabon thought that Darwin’s experi- 
ment would be equivalent to one in which the heights 
of pairs of contrasted plants had been those given in 
his columns headed VI. and VII., and that the 
reliability of Darwin’s average difference of about 
2§ inches could be fairly judged from the constancy 
of the 15 differences shown in column VIII. 

How great an effect this procedure, if legitimate, 
would have had on the significance of the result, 
may be seen by treating these artificial differences as 
we have treated the actual differences given by 
Darwin. Applying the same arithmetical procedure 
as before, we now find t equals 5-171, a value which 
would be exceeded by chance only about once or twice 
in xo,ooo trials, and is far beyond the level of signifi- 
cance ordinarily required. The falsification, inherent 
in this mode of procedure, will be appreciated, if we 
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consider that the tallest plant, of either the crossed 
or the self-fertilised series, will have become the 
tallest by reason of a number of favourable circum- 
stances, including among them those which produce 
the discrepancies between those pairs of plants, which 
were actually grown together. By taking the differ- 
ence between these two favoured plants we have 
largely eliminated real causes of error which have 
affected the value of our observed mean. We have, 
in doing this, grossly violated the principle that the 
estimate of error must be based on the effects of the 
very same causes of variation as have produced the 
real errors in our experiment. Through this fallacy 
Galton is led to speak of the mean as perfectly reliable, 
when, from its standard error, it appears that a 
repetition of the experiment would often give a mean 
quite 50 per cent, greater or less than that observed in 
this case. 

20. Validity and Randomisation 

Having decided that, when the structure of the 
experiment consists in a number of independent 
comparisons between pairs, our estimate of the error 
of the average difference must be based upon the 
discrepancies between the differences actually ob- 
served, we must next enquire what precautions are 
needed in the practical conduct of ^he experiment 
to guarantee that such an estimate shall be a valid 
one ; that is to say that the very same causes that 
produce our real error shall also contribute the 
materials for computing an estimate of it. The logical 
necessity of this requirement is readily apparent, for, 
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if causes of variation which do not influence our real 
error are allowed to affect our estimate of it, or equally, 
if causes of variation affect the real error in such a 
way as to make no contribution to our estimate, this 
estimate will be vitiated, and will be incapable of 
providing a correct statement as to the frequency 
with which our real error will exceed any assigned 
quantity ; and such a statement of frequency is the 
sole purpose for which the estimate is of any use. 
Nevertheless, though its logical necessity is easily 
apprehended, the question of the validity of the 
estimates, of error used in tests of significance was for 
long ignored, and is still often overlooked in practice. 
One reason for this is that standardised methods of 
statistical analysis have been taken over ready-made 
from a mathematical theory, into which questions 
of experimental detail do not explicitly enter. In 
consequence the assumptions which enter implicitly 
into the bases of the theory have not been brought 
prominently under the notice of practical experi- 
menters. A second reason is that it has not until 
recently been recognised that any simple precaution 
would supply an absolute guarantee of the validity 
of the calculations. 

In the experiment under consideration, apart from 
chance differences in the selection of seeds, the sole 
source of the experimental error in the average of our 
fifteen differences lies in the differences in soil fertility, 
illumination, evaporation, etc., which make the site 
of each crossed plant more or less favourable to 
growth than the site assigned to the corresponding 
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self-fertilised plant. It is for this reason that every 
precaution, such as mixing the soil, equalising the 
watering and orienting the pot so as to give equal 
illumination, may be expected to increase the precision 
of the experiment. If, now, when the fifteen pairs of 
sites have been chosen, and in so doing all the 
differences in environmental circumstances, to which 
the members of the different pairs will be exposed 
during the course of the experiment, have been 
predetermined, we then assign at random, as by 
tossing a coin, which site shall be occupied by the 
crossed and which by the self-fertilised plant, we 
shall be assigning by the same act whether this 
particular ingredient of error shall appear in our 
average with a positive or a negative sign. Since 
each particular error has thus an equal and ' 
independent chance of being positive or negative, 
the error of our average will necessarily be distributed 
in a sampling distribution, centred at zero, which 
will be symmetrical in the sense that to each possible 
positive error there corresponds an equal negative 
error, which, as our procedure guarantees, will in fact 
occur with equal probability. 

Our estimate of error is easily seen to depend 
only on the same fifteen ingredients, and the arith- 
metical processes of summation, subtraction and 
division may, be designed, and have in fact been 
designed, so as to provide the estimate appropriate 
to the system of chances which our method of choosing 
sites had imposed on the data. This is to say much 
more than merely that the experiment is unbiased, 
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for we might still call the experiment unbiased if the 
whole of the cross-fertilised plants had been assigned 
to the west side of the pots, and the self-fertilised 
plants to the east side, by a single toss of the coin. 
That this would be insufficient to ensure the validity 
of our estimate may be easily seen ; for it might well 
be that some unknown circumstance, such as the 
incidence of different illumination at different times 
of the day, or the desiccating action of the air-currents 
prevalent in the greenhouse, might systematically 
favour all the plants on one side over those on the 
other. The effect of any such prevailing cause would 
then be confounded with the advantage, real or 
apparent, of cross-breeding over inbreeding, and 
would be eliminated from our estimate of error, 
which is based solely on the discrepancies between 
the differences shown by different pairs of plants. 
Randomisation properly carried out, in which each 
pair of plants is assigned their positions independently 
at random, ensures that the estimates of error will 
take proper care of all such causes of different growth 
rates, and relieves the experimenter from the anxiety 
of considering and estimating the magnitude of the 
innumerable causes by which his data may be 
disturbed. The one flaw in Darwin’s procedure was 
the absence of randomisation. 

Had the same measurements been obtained from 
pairs of plants properly randomised the experiment 
would, as we have shown, have fallen on the verge 
of significance. Galton was led greatly to over- 
estimate its conclusiveness through the major error 

* T\ 
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of attempting to estimate the reliability of the 
comparisons by re-arranging the two series in order 
of magnitude. His discussion shows, in other respects, 
an over-confidence in the power of statistical methods 
to remedy the irregularities of the actual data. In 
particular the attempt mentioned by Darwin to 
improve on the simple averages of the two series 
“ by a more correct method . . . by including the 
heights, as estimated in accordance with statistical 
rules, of a few plants which died before they were 
measured,” seems to go far beyond the limits of 
justifiable inference, and is one of many indications 
that the logic of statistical induction was in its 
infancy, even at a time when the technique of 
accurate experimentation had already been consider- 
ably advanced. 

21 . Test of a Wider Hypothesis 
It has been mentioned that “ Student's” t test, in 
conformity with the classical theory of errors, is 
appropriate to the null hypothesis that the - two 
groups of measurements are samples drawn from the 
same normally distributed population. This is the 
type of null hypothesis which experimenters, rightly 
in the author’s opinion, usually consider it appropriate 
to test, for reasons not only of practical convenience, 
but because the unique properties of the normal 
distribution make it alone suitable for general applica- 
tion. There has, however, in recent years, been a 
tendency for theoretical statisticians, not' closely in 
touch with the requirements of experimental data, to 
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stress the element of normality, in the hypothesis 
tested, as though it were a serious limitation to the 
test applied. It is, indeed, demonstrable that, as a 
test of this hypothesis, the exactitude of “ Student’s ” 
t test is absolute. It may, nevertheless, be legitimately 
asked whether we should obtain a materially different 
result were it possible to test the wider hypothesis 
which merely asserts that the two series are drawn 
from the same population, without specifying that 
this is normally distributed. 

In these discussions it seems to have escaped 
recognition that the physical act of randomisation, 
which, as has been shown, is necessary for the 
validity of any test of significance, affords the means, 
in respect of any particular body of data, of examining 
the wider hypothesis in which no normality of 
distribution is implied. The arithmetical procedure 
of such an examination is tedious, and we shall only 
give the results of its application as showing the 
possibility of an independent check on the more 
expeditious methods in common use. 

On the hypothesis that the two series of seeds are 
random samples from identical populations, and that 
their sites have been assigned to members of each 
pair independently at random, the 15 differences 
of Table 3 would each have occurred with equal 
frequency with a positive or with a negative sign. 
Their sum, taking account of the two negative signs 
which have actually occurred, is 314, and we may ask 
how many of the 2 15 numbers, which may be formed 
by giving each component alternatively a positive and 
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a negative sign, exceed this value. Since ex hypothesi 
each of these 2 15 combinations will, occur by chance 
with equal frequency, a knowledge of how many of 
them are equal to or greater than the value actually 
observed affords a direct arithmetical test of the 
significance of this value. 

It is easy to see that if there were no negative signs, 
or only one, every possible combination would exceed, 
314, while if the negative signs are 7 or more, every 
possible combination will fall short of this value. The 
distribution of the cases, when there are from 2 to 6 
negative values, is shown in the following table : — 


TABLE 4 

Number of combinations of differences, positive or negative, 
which exceed or fall short of the total observed 


Number of negative 
values. 

> 3 H 

= 314 

<314 

Total. 

O 

I 



I 

I 

IS 



15 

2 

94 

I 

IO 

105 

3 

263 

3 

189 

455 

4 

302 

I I 

1,052 

1,365 

5 

138 

12 

2-853 

3.003 

6 

20 

I 

4,984 

5.005 

7 or more . 



22,819 

22,819 

Total . 

833 

28 

31-907 

32.768 


In just 861 cases out of 32,768 the total deviation 
will have a positive value as great as or greater than 
that observed. In an equal number of cases it will 
have as great a negative value. The two groups 
together constitute 5-255 per cent, of the possibilities 
available, a result very nearly equivalent to that 
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obtained using the t test with the hypothesis of a 
normally distributed population. Slight as it is, 
indeed, the difference between the tests of these two 
hypotheses is partly due to the continuity of the 
/■distribution, which effectively counts only half of 
the 28 cases which give a total of exactly 314, as being 
as great or greater than the observed value. 

Both tests prove that, in about 5 per cent, of trials, 
samples from the same batch of seed would show 
differences just as great, and as regular, as those 
observed ; so that the experimental evidence is 
scarcely sufficient to stand alone. In conjunction with 
other experiments, however, showing a consistent 
advantage of cross-fertilised seed, the experiment has 
considerable weight ; since only once in 40 trials 
would a chance deviation have been observed both 
so large, and in the right direction. 

How entirely appropriate to the present problem 
is the use of the distribution of t, based on the theory 
of errors, when accurately carried out, may be seen 
by inserting an adjustment, which effectively allows 
for the discontinuity of the measurements. This 
adjustment is not usually of practical importance, 
with the t test, and is only given here to show the 
close similarity of the results of testing the two hypo- 
theses, in one. of which the errors are distributed 
according to the normal law, whereas in the other 
they may be distributed in any conceivable manner. 
The adjustment ;; consists in calculating the value of t 

This adjustment is an extension to the distribution of t of Yates’ 
adjustment for continuity, which is of greater importance in the 
distribution of A, for which it was developed. 

D 2 
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as though the total difference between the two sets of 
measurements were less than that actually observed 
by half a unit of measurement ; i.e. as if it were 
313-5 instead of 314. The value of t is then found 
to be 2-143 instead of 2-148. The following table 
shows the effect of the adjustment on the test of 
significance, and its relation to the test of the more 
general hypothesis. 


TABLE 5 


xt , , . , • f unadjusted 

Normal hypothesis ( ftdj us J tcd 

General hypothesis . 


t. 

2-148 

2-143 


Probability of a Positive 
Difference exceeding that 
observed. 

2-491 per cent. 
2-SI4 

2-628 ,, 


The difference between the two hypotheses is thus 
equivalent to little more than a probability of one in 
a thousand. 
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AN AGRICULTURAL EXPERIMENT IN 
RANDOMISED BLOCKS 

22. Description of the Experiment 

In • pursuance of the principles indicated by die 
discussions in the previous cliapters we may now take 
an example from agricultural experimentation, the 
branch of the subject in which these principles have 
so far been most explicitly developed, and in which 
the advantages and disadvantages of the different 
methods open to the experimenter may be most 
clearly discussed. 

We will suppose that our experiment is designed 
to test the relative productivity, or yield, of five 
different varieties of a farm crop ; and that a decision 
has already been arrived at as to what produce shall 
be regarded as yield. In the case of cereal crops, for 
example, we may decide to measure the yield as total 
grain, or as grain sufficiently large not to pass a 
specified sieve, or as grain and straw valued together 
at predetermined prices, or in whatever method may 
be deemed appropriate for the purposes of the 
experiment. Our object is to determine whether, on 
the soil or in the climatic conditions experienced by 
the test, any of the varieties tested yield more than 
others, and, if so, to evaluate the differences with a 
determinate degree of precision. 
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We shall suppose that the experimental area is 
divided into eight compact, or approximately square, 
blocks, and that each of these is divided into five plots 
running from end to end of the block, and lying side 
by side, making forty plots in all. Apart from the 
differences in variety to be used, the whole area is 
to have uniform agricultural treatment. At harvest, 
narrow edges about a foot in width for cereal crops, 
or the width of a single row for larger plants, such as 
rodts and potatoes, are to be discarded from experi- 
mental yields ; the central portions, cut to be of equal 
area, are to be harvested, and the produce weighed, 
or, if preferred, measured in some other manner. 

In each block the five plots are assigned one to 
each of the five varieties under test, and this assign- 
ment is made at random. This does not mean that 
the experimenter writes down the names of the 
varieties, or letters standing for them, in any order 
that may occur to him, but that he carries *out a 
physical experimental process of randomisation, using 
means which shall ensure that each variety has an 
equal chance of being tested on any particular plot 
of ground. A satisfactory method is to use a pack 
of cards numbered from i to ioo, and to arrange 
them in random order by repeated shuffling. The 
varieties are then numbered from i to 5, and any card 
such as number 33, for example, is deemed to corre- 
spond to variety number 3, because on dividing by 
5 this number is found as the remainder. Numbers 
divisible by 5 will correspond to variety number 5. 
The order of varieties in each block may then be 
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quickly determined from the order of the cards in 
the pack, after thoroughly shuffling. The remainder 
corresponding to any variety is disregarded after its 
first occurrence in the block. 

Since 5 is a divisor of a hundred, each variety will 
be represented by 20 cards, and the probabilities of 
each appearing in any particular place will be equal. 
If we had been randomising six varieties we should 
have used a number of cards divisible by 6, for 
example 96, and could, for this purpose, use the same 
pack as before, discarding the 4 cards numbers 97 to 100, 
or indeed, any other four cards the numbers of which 
leave the remainders 1, 2, 3 and 4 on dividing by 6. 

To save the labour of card shuffling use is often 
made of printed tables of random sampling numbers, 
in which, for example, all numbers of 4 figures are 
arranged in random order. Starting at any point 
in such a table and proceeding in any direction, such 
as up or down the columns, or along the rows, we 
may take each pair of digits to represent the number 
of a card in the pack of 100, disregarding any which 
may be superfluous for our purpose. Using such 
means the process of randomisation is extremely rapid, 
and a chart showing the arrangement of the experi- 
ment may be prepared as quickly as if the varieties 
had been set out in a systematic order. 

23. Statistical Analysis of the Observations 

The arithmetical discussion by which the experi- 
ment is to be interpreted is known as the analysis of 
variance. This is a simple arithmetical procedure, by 
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means of which the results may be arranged and 
presented in a single compact table, which shows 
both the structure of the experiment and the relevant 
results, in such a way as to facilitate the necessary 
tests of their significance. The structure of the 
experiment is determined when it is planned, and 
before the content of its results, consisting of the 
actual yields from the different "plots, is known. It 
depends on the number of varieties to be compared, 
on the number of replications of each obtainable, and 
on the system by which these are arranged; in our 
present example in randomised blocks. In its arith- 
metical aspect this structure is specified by the numbers 
of degrees of freedom, or of independent comparisons, 
which can be made between the plots, or relevant 
groups of plots. Between 40 plots 39 independent 
comparisons can be made, and so the total number 
of degrees of freedom will be 39. This number will 
be divided into 3 parts representing the numbers of 
independent comparisons ( a ) between varieties, (S) 
between blocks, and (c) representing the discrepancies 
between the relative performances of different varieties 
in different blocks, which discrepancies provide a basis 
for the estimation of error. We may specify the structure 
of our typical experiment by a partition of the total of 
39 degrees of freedom into these three parts as under. 

TABLE 6 

Structure of an Experiment in Randomised Blocks 
Varieties .... 4 

Blocks .... 7 

Error .28 


Total . 


• 39 
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It is easy to see that the number of degrees of 
freedom for any group of simple comparisons, such as 
those between varieties or between blocks, must be 
i less than the number of items to be compared. In 
the present instance, in which the plots are assigned 
within the blocks wholly at random the whole of the 
remaining 28 degrees of freedom are due simply to 
differences in fertility between different plots within 
tho same block, and are therefore available for 
providing the estimate of error. As will be explained 
more fully later, many more complicated modes of 
subdivision of the total number of degrees of freedom 
may be employed, and will be appropriate to more 
complicated forms of experimental enquiry. I he form 
we have set out is appropriate to the question whether 
the yields given by the different varieties show, as 
a whole, greater differences than would ordinarily be 
found, had only a single variety been sown on the 
same land. It is appropriate to test the null hypo- 
thesis that our 5 varieties give in fact the same yields. 

The completion of the analysis of variance, when 
the yields are known, must be strictly in accordance 
with the structure imposed by the design of the 
experiment, and consists in the partition of a quantity 
known as the sum of squares {i.e., of deviations from 
the mean) intp the same three parts as those into 
which we have already divided the degrees of freedom. 
Our data consist of 40 yields (y ), 5 from each block 
and 8 from each variety which, for further calculation, 
can be conveniently arranged in a table of 5 columns 
and 8 lines. 
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TABLE 7 

Scheme for calculation of totals and means 













Total 

A 

Mean 

a 


— 

— 

— 

— 

— 

B 

b 


— 

— 

— 

— 

— 

C 

c 
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— 

— 

D 

d 


- — - 
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— 

— 

— 

E 

e 
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- — ■ 
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— 

- • 

F 

/ 
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— 

— 

— 

G 

g 
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— 

— 

— 

H 

h 

Total 

P 

Q 

R 

S 

T 

M 


Mean 

P 

9 

r 

s 

t 


m 


The totals of the five columns will then represent the 
totals of the yields obtained from the 5 varieties and 
may be designated by the capital letters P, Q, R, S 
and T. The mean yields from the five varieties, 
found by dividing these by 8, will be designated by 
small letters p, q, r, s and t. In like manner the 
totals of the rows will represent the total yields 
harvested from each of the blocks of land used ; we 
shall denote these by A, B, C, D, E, F, G, H, and 
the corresponding means by a, b, c, d, e, /, g, h. 
Evidently, the totals of the rows and the totals of 
the columns are sub-totals of the same grand total, 
denoted by M, from which the general mean m is 
derived by dividing by 40. The arrangement is 
illustrated in Table 7, also Table 7A, p. 76, gives some 
numerical observations in this form. 

The sum of squares of deviations from the mean, 
which, in the analysis of variance, is to be divided into 
portions corresponding to varieties, blocks, and error, 



ANALYSIS OF VARIANCE 


61 


is found by adding together the squares of the 40 
recorded yields, and deducting the product M m. The 
difference, which corresponds to the total 39 degrees of 
freedom is actually the sum of the squares of the 40 
differences or deviations between the actual yields, y, 
and their general mean, m ; it therefore measures 
the total amount of variation due to all causes, 
observed between our different plots. The method 
we .have given, however, for obtaining this quantity 
is convenient for our purpose, for the product M m is 
used also in our calculation of the other entries of 
the table, which are indeed rapidly obtainable once 
the total sum of squares is known. The portion, 
for example, corresponding to the 4 degrees of 
freedom between varieties is found simply by summing 
the products P/> + Qy+ • • •> an< ^ deducting M/«. 
Similarly, the portion ascribable to the 7 degrees of 
freedom between blocks is obtained by summing the 
products ka+Bb+Cc . . ., and deducting Mm. 
Knowing the contributions to the sum of squares of 
these 1 1 degrees of freedom, the amount corresponding 
to the remaining 28 degrees of freedom, due to error, 
may be found by subtraction from the total. In this 
way the total sum of squares, representing the total 
amount of variation due to all causes between the 
40 yields of tjie experiment, is divided into the 3 
portions relevant to its interpretation, measuring 
respectively the amount of variation between varieties, 
the amount of variation between blocks, and the 
amount of discrepancy between the performances of 
the different varieties in the different blocks. The 
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greater part of the arithmetical labour is accom- 
plished with the calculation of the total sum of 
squares. 

Corresponding to the three sums of squares into 
which the total has been partitioned, we may now 
calculate the mean squares, by dividing each by the 
corresponding number of degrees of freedom. On 
the null hypothesis the mean squares for variety and 
error have the particularly simple interpretation that 
each may be regarded as an independent estimate of 
the same single quantity, the variance due to error of 
a single plot. If the varieties had in fact the same 
yield the mean square derived from the 4 degrees 
of freedom of varieties would have, on the average, 
the same value as that derived from the 28 degrees of 
freedom for error. In any one trial these values would 
indeed differ, but only by errors of random sampling. 
The relative precision of our estimates is determined 
solely by the number of degrees of freedom upon 
which each is based, so that, knowing these numbers, 
the ratio of any two estimates affords a test of 
significance. In other words, on the null hypothesis 
the random sampling distribution of this ratio is 
precisely known. Thus, in our example, it would 
happen just once in 20 trials that the estimate based 
on 4 degrees of freedom exceeded .that based on 
28 degrees of freedom in a ratio greater than 2-714. 
If, therefore, the observed ratio exceeds this level, 
w.e have a measurable basis for confidence that the 
differences observed between the yields of the different 
varieties are not due wholly to the differences in 
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fertility of the plots on which they were grown. 
Again, in only 1 per cent, of trials will the ratio 
exceed 4-074, a value which thus marks the level 
of a more severe test of significance. Since tests of 
the same kind will be required for all possible pairs 
of numbers of degrees of freedom, it is convenient for 
purposes of tabulation to use a criterion which varies 
more regularly than the arithmetical ratio employed 
in the illustrations above. It is usual therefore to 
carry out the calculation by’ using the natural loga- 
rithms of the mean squares, and since the difference 
of the two logarithms specifies the ratios of the 
corresponding numbers, the tables used in this test of 
significance give the values of a quantity, s, defined 
as half the difference between the natural logarithms 
obtained. 

The z test may be regarded as an extension of the 
test, appropriate to cases where more than two 
variants are to be compared. Like it, it is derived 
from the theory of errors, and is exact when the 
normal law of errors is realised. 1 1 is even less affected 
than the t test by such deviations from normality as 
are met with in practice. As with the t test, its 
appropriateness to any particular body of data may 
be verified arithmetically. Such verification is not 
ordinarily necessary and is always laborious. Often 
the number of random arrangements available is far 
too great for them to be examine4 exhaustively, as 
was done with Darwin’s experiment in Chapter III. 
Eden and Yates have, however, published a method 
of obtaining rapidly the results of a large random 
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selection of these arrangements, and have demon- 
strated how closely the theoretical distribution was 
verified in material that was far from normal. 

24. Precision of the Comparisons 

If the yields of the different varieties in the 
experiment fail to satisfy the test of significance they 
will not often need to be considered further, for the 
results, as so far tested, are compatible with the view 
that all differences observed in the experiment are due 
to variations* in the fertility of the experimental area, 
and this is the simplest interpretation to put upon 
the results. If, however, a significant value of z has 
been obtained the null hypothesis has been falsified, 
and may therefore be set aside. We shall thereafter 
proceed to interpret the differences between the 
varietal yields as due at least in part to the inherent 
qualities of the varieties, as manifested on the condi- 
tions of the test, and shall be concerned to know with 
what precision these different yields have been 
evaluated. For this purpose the mean square, 
corresponding to the 28 degrees of freedom assigned 
to error, is available as an estimate of the variance 
of a single plot due to the uncontrolled causes which 
constitute the errors of our determinations. From 
this fundamental estimate we may derive a corre- 
sponding estimate of the variance of the sum of the 
yields from 8 plots by multiplying by 8, or, if we 
prefer, we may derive the variance of the mean yield 
of 8 plots by dividing by 8. In either case the square 
root of the variance gives the standard deviation, and 
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provides therefore a means of judging which of the 
differences among our varietal yield values are 
sufficiently great to be regarded as well established, 
and which are to be regarded as probably fortuitous. 
If the experiment leaves any grounds for practical 
doubt, values may be compared by the t test mentioned 
in Chapter II., remembering that our estimate of 
error is based on 28 degrees of freedom. 

It is an advantage of arrangements in randomised 
blocks that, corresponding to any particular com- 
parison contrast, the components of error appertaining 
to this comparison may be isolated. 1 his is done 
simply by finding the difference in yield, or perform- 
ance, between the treatments, or groups of treatments, 
to be compared, in each replication of the experiment. 
The discrepancies between these differences obtained 
from different replications, taking account of their 
signs, constitute the components of error appropriate 
to this comparison, which may now be tested by a t 
test, independently of the other comparisons which 
the experiment affords. Although fewer degrees of 
freedom are available for the estimation of error from 
these components only, their isolation affords an 
additional safeguard when, as may sometimes occur, 
some comparisons are, in effect, less accurately 
evaluated than .others. 

When the z test does not demonstrate significant 
differentiation, much caution should be used before 
claiming significance for special comparisons. Com- 
parisons, which the experiment was designed to 

make, may, of course, be made without hesitation. 

E 
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It is comparisons suggested subsequently, by a 
scrutiny of the results themselves, that are open to 
suspicion ; for if the variants are numerous, a 
comparison of the highest with the lowest observed 
value, picked out from the results, will often appear 
to be significant, even from undifferentiated material. 
Properly, such unforeseen effects should be regarded 
only as suggestions for future experimentation, in 
which they can be deliberately tested. To form a 
preliminary opinion as to the strength of the evidence, 
it is sometimes useful to consider how many similar 
comparisons would have been from the start equally 
plausible. Thus, in comparing the best with the 
worst of ten tested varieties, we have chosen the pair 
with the largest apparent difference out of 45 pairs, 
which might equally have been chosen. We might, 
therefore, require the probability of the observed 
difference to be as small as 1 in 900, instead of 1 in 
20, before attaching statistical significance to the 
contrast. 

25. The Purposes of Replication 

An examination of the structure of the standard 
type of agricultural experiment described above, and 
of the use made of its structure in the statistical 
process of interpretation, shows that the replication or 
repetition of the varieties tested on different plots of 
land serves two distinct purposes. It serves first to 
diminish the error, a purpose which has been widely 
recognised, though the manner in which it does so has 
not always been well understood. In our experiment 
the sampling variance of a mean yield was found by 
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dividing the estimate of variance for a single plot by 
8. Since the variance of a single plot would not be 
necessarily or systematically increased by increasing 
the number of blocks, the variance of our mean 
yields will generally fall off inversely to the number 
of replications included. In increasing the number 
of blocks, however, # we should have increased the 
area of the experiment, and it is probable that this 
incfease in area, even if we had used the same number 
of larger plots, would itself have served to diminish 
the experimental error. If the area of the experiment 
were kept constant and the replication increased by 
using smaller plots we should only gain in precision 
if, as abundant agricultural experimentation shows 
to be generally the case, the greater proximity of the 
smaller areas led to a greater similarity in the fertility 
of the soil. The practical limit to plot subdivision is 
set, in agricultural experiments, by the necessity of 
discarding a strip at the edge of each plot. The width 
of the strip depends on the competition of neighbouring 
plants for moisture, soil nutrients and light, and is 
independent of the size of the plots. Consequently, 
as smaller plots are used, a larger proportion of the 
experimental area has to be discarded. The soil 
heterogeneity of most experimental land is, however, 
so pronounced that it is profitable to discard a 
considerable proportion of the area, in order to bring 
the experimental treatments or varieties to be con- 
trasted more closely together than would otherwise be 
possible. With plants, such as potatoes and sugar- 
beets, where >'t is sufficient to discard a single row, 
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one on each side of the plot, it has been repeatedly 
found that strips of 4 rows wide, of which only the 
central two are included in the yield, make a more 
economical use of a given experimental area than 
either wider or narrower plots would do. 

Replication, therefore, in the sense of the com- 
minution of the experimental aj-ea, down to plots of 
the most efficient size, has an important but limited 
part to play in increasing the precision of an experi- 
ment. It should not, in this connection, be overlooked 
that many other factors contribute to the same result, 
such as accuracy in harvesting and weighing the 
produce ; in measuring the areas of land harvested ; 
care in the choice of the experimental area ; in insuring 
the similarity of the treatment of its different parts ; 
in safeguarding the crop against damage, and its 
produce against loss. All these factors contribute to 
the precision of the experiment, and though, when 
the conditions are otherwise favourable, there can 
be no doubt that attention is rightly concentrated on 
diminishing the important causes of error due to 
variations in soil fertility, it is evident that, even in 
experiments in'which these causes were almost wholly 
eliminated, neglect of common-sense precautions, 
which, none the less, require care and supervision, 
may lead to entirely unreliable results. 

26. Validity of the Estimation of Error 

Whereas replication of the experimental varieties 
or treatments on different plots, formed by the 
subdivision of the experimental area, is of value as 
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one of the means of increasing the accuracy of the 
experimental comparisons, its main purpose, which 
there is no alternative method of achieving, is to 
supply an estimate of error by which the significance 
of these comparisons is to be judged. The need of 
such an estimate may be perceived by considering the 
doubts with which the interpretation of an experiment 
would be involved if'it consisted only of a single plot 
for -each treatment. The treatment giving the highest 
yield would of course appear to be the best, but no 
one could say whether the plot would not in fact 
have yielded as well under some or all of the other 
treatments. If, indeed, the difference in yield appeared 
large to the experimenters they might argue that so 
large a difference could not reasonably be ascribed to 
a difference in soil fertility, since it was contrary to 
their experience that neighbouring plots treated alike 
should differ so greatly. To enforce this argument 
they would in fact have to claim that their past 
experience had already furnished a basis for the 
estimation of error, which could be applied with 
confidence to the circumstances of the experiment 
under discussion. Even if this claim could be: granted 
the experiment would carry with it the serious dis- 
advantage that it would no longer be self-contained, 
but would depend for its interpretation from experience 
previously gathered. It could no longer be expected 
to carry conviction to others lacking this supple- 
mentary experience. How weak the evidence of such 
previous experience must always be will be seen by 

considering that, even if the identical area of the 

£ 2 
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experiment, divided into the same plots, had been 
harvested under uniform treatment in previous years, 
it would need twenty years’ experience to form even 
the roughest judgment as to how great a difference 
between the yields of any two plots would occur by 
chance as often as once in 20 trials. It i$ on the 
exactitude of our estimate of the magnitude of this 
difference that the precision of our test of significance 
must depend. Even such a tedious series of pre- 
liminary trials, moreover, could only supply a direct 
basis for the test of significance, if we could assume 
the absence of progressive changes, both in the 
weather and in the condition of the soil. This 
consideration effectively demonstrates that the 
accumulation of past experience, as a basis for testing 
significance, is as insecure in theory as it would be 
inconvenient in practice. 

The impossibility of testing two or more treat- 
ments, in the same year, and on identically the same 
land, is not, however, an insuperable obstacle to exact 
experimentation. It is surmounted by testing the 
treatments not on identical land, but on random 
samples of the same experimental areas. From this 
aspect the appropriateness of a random assignment 
of the treatments to the different plots appears most 
inevitably. We shall need to judge of the magnitude 
of the differences introduced by testing our treatments 
upon different plots by the discrepancies between the 
performances of the same treatment in different 
blocks. Our estimate of error must be obtained by 
a comparison of plots treated alike, but it is to be 
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applied to interpret the differences observed between 
sets of plots treated differently. The validity of our 
estimate of error for this purpose is guaranteed by the 
provision that any two plots, not in the same block, 
shall have the same probability of being treated alike, 
and the same probability of being treated differently 
in each of the ways in which this is possible. The 
purpose of randomisation in this, as in the previous 
experiments exemplified, is to guarantee the validity 
of the test of significance, this test being based on an 
estimate of error made possible by replication. 

27. Bias of Systematic Arrangements 

In any particular case it will probably be possible 
to assign sets of plots within an area to the several 
treatments so as to equalise their fertility more 
completely than is done by a random arrangement, 
and many systematic arrangements for doing this 
have from time to time been proposed. The effect 
of such a procedure on the test of significance may 
be seen by imagining it carried out on an area under 
uniform treatment, so that the actual yields are not 
at all effected by the reallocation of the plots. In the 
analysis of variance, therefore, the total sum of 
squares is unchanged, as is also the portion ascribable 
to blocks. I f,» therefore, the agronomist’s ingenuity 
has been successful in diminishing the differences in 
fertility between treatments, the diminution of the 
sum of squares in that line of the table will have been 
exactly counterbalanced by an increase in the sum of 
squares upon which the estimate of error is based. 
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The effect of the rearrangement will have been to 
diminish the real errors of the experiment, but at the 
expense of increasing the estimate of error ; so that, 
although the comparisons have really been improved 
in precision they will appear to have been less accurate 
than before, and less reliance will be placed on the 
result. In the opposite case, likewise, if by bad luck 
or bad judgment the systematic arrangement adopted 

has increased rather than lessened the real errors 

« 

of the experiment, then the estimate of error will be 
even diminished, and will be, for both reasons, an 
underestimate of the errors actually incurred. The 
results of using arrangements which differ from the 
random arrangement in either direction are thus in 
one way or the other undesirable. This is to be 
expected, since in both cases the estimate of error is 
vitiated, or rendered unreliable for the purpose for 
which it was made. 

28. Partial Elimination of Error 

It is to be noted that the restriction upon a purely 
random arrangement which has been imposed, by 
applying each treatment once only on each block of 
land, introduces no such disturbance of the validity 
of the estimate of error. For the differences in 
average fertility between the different* blocks of land 
used, which have been, by this restriction, eliminated 
from our experimental comparisons, have been equally 
eliminated from our estimate of error in the analysis of 
variance. Prior to the introduction of this method it 
was, indeed, common for elements of error which had 
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been carefully and thoroughly eliminated in the field 
to be reintroduced in the process of statistical estima- 
tion ; so that successful experimental arrangements 
were made to appear to be unsuccessful and vice versa. 
The essential fact governing our analysis is that the 
errors due to soil heterogeneity will be divided, by a 
good experiment, into two portions. The first, which 
is to be made as large as possible, will be completely 
eliminated, by the arrangement of the experiment, 
from the experimental comparisons, and will be as 
carefully eliminated in the statistical laboratory from 
the estimate of error. As to the remainder, which 
cannot be treated in this way, no attempt will be made 
to eliminate it in the field, but, on the contrary, it will 
be carefully randomised so as to provide a valid 
estimate of the errors to which the experiment is in 
fact liable. 


29. Shape of Blocks and Plots 

Having satisfied ourselves that replication, sup- 
plemented by randomisation, will afford a valid test 
of the significance of our comparisons, we may 
consider what modifications of our practical procedure 
will serve to increase the precision of these com- 
parisons. If several areas of land are available for 
experiment some care may usually be given to choose 
one that appears to be uniform, as judged by the 
surface and texture of the soil, or by the appearance 
of a previous crop, though the value of such judgments 
by inspection, with which alone we are here concerned, 
appears to be very easily overrated. After choosing 



74 EXPERIMENT IN RANDOMISED BLOCKS 

the area we usually have no guidance beyond the 
widely verified fact that patches in close proximity 
are commonly more alike, as judged by the yield of 
crops, than those which are further apart. Conse- 
quently, the division of the land into compact or 
approximately square blocks will usually result in 
the blocks being as much unlike as possible, while 
different areas within the same block will be more 
closely similar than if the blocks had been long and 
narrow. The effect of this upon the analysis of 
variance is to place as large a fraction as possible of 
the variance due to soil heterogeneity in the portion 
ascnbable to variation between blocks, this portion 
being eliminated from our experimental error ; and 
to leave as little as possible in the variation 
within blocks, which supplies both our experimental 
errors, and our estimate of them. It is, there- 
fore, a safe rule to make the blocks as compact 
as possible. 

With respect to our subdivision of the blocks into 
plots our object is exactly the opposite. The experi- 
mental error arises solely from differences between 
the areas chosen as plots within the same block. These 
differences must be made as small as possible, or, in 
other words, each plot must, so far as may be, sample 
fairly the whole area of the block in which it is placed. 

It is often desirable, therefore, when it does not 
conflict with agricultural convenience in other ways, 
to let the plots lie side by side as narrow strips, each 
running the whole length of its block. It is not, 
however, in every type of experiment an advantage to 
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use such elongated plots. Some important causes of 
soil heterogeneity, dependent from agricultural opera- 
tions, affect the land in stripes. Klongated plots will 
then only be advantageous if they can be laid trans- 
versely to these stripes. When this would entail 
inconvenience, or additional labour in cutting the 
correct area, as in the case of strip plots with cereals, 
running across the drill rows, the labour avilable for 
the-experiment may often be better applied by using 
square plots, and improving the accuracy in other 
ways. Plots ot compact shape arc, indeed, commonly 
used in experiments in randomised blocks, not because 
the theoretical advantage of using elongated strips, 
in one direction or the other, is not appreciated, but 
for the purely practical reason that to realise it by 
laying strips in the required direction would be a more 
costly method of increasing precision than other 
methods at the experimenter’s disposal. 

30. Practical Example 

The following data show a comparison of the 
yields of five varieties of barley in an experiment 
arranged in randomised blocks, carried out in the 
state of Minnesota in the years 1930 and 1931 and 
reported by F. R. Immer, H. K. Hayes and Le Roy 
Powers in the. Journal of the Atnerican Society of 
Agronomy. The experiment really dealt with ten 
varieties, of which five have been selected for this 
example. The blocks in the example are twelve 
separate experiments carried out at six locations in 
the state in the two years. 
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THE LATIN SQUARE 
31. Randomisation Subject to Double Restriction 

The subdivision of the area of an agricultural 
experiment into compact blocks, in each of which 
all the experimental treatments to be compared are 
equally represented, has been found to add greatly 
to the precision of the experimental comparisons 
obtainable by the expenditure of a fixed amount of 
labour, and supervisory care, to a limited area of land. 
An equally great advantage is obtained, in other 
fields of research, by a similar subdivision of the 
material into relatively homogeneous series, to each 
of which the different experimental treatments are 
applied in equal proportion. The extent of this gain 
is limited only by the degree of homogeneity which 
can be obtained within each series. It is an essential 
condition of experimentation that the experimental 
material is known to be variable, but it is not known, 
in respect of any individual, in what direction his 
response to a given treatment will vary from the 
average. No direct allowance for this variability can, 
therefore, be made. The knowledge which guides us 
in increasing the precision of an experiment is not 
a knowledge of the individual peculiarities of particular 
experimental units, such as plots of land, experimental 
animals, coco-nut palms, or hospital patients, but a 
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knowledge that there is less variation within certain 
a § 3 » re S ates these than there is among different 
individuals belonging to different aggregates. The 
recognition of criteria by which the experimental 
material may be fruitfully subdivided thus plays 
an important part in all types of quantitative 
experimentation. 

It was first shown in experimental agriculture, 
though the principle has since been applied in other 
fields, that the process of subdivision might profitably 
be duplicated. This experimental principle is best 
illustrated by the arrangement known as the Latin 
square, a method which is singularly reliable in 
giving precise comparisons when the number of 
treatments (or varieties, etc.) to be compared is from 
4 to 8. Suppose we wish to compare 6 treatments. 
The experimental area (which need not be an exact 
square in form, but should be a relatively compact 
rectangle) is divided into 36 equal plots lying in 6 
rows and 6 columns. It is then a combinatorial fact 
that we can assign plots to the 6 treatments such that 
for each treatment one plot lies in each row and one 
in each column of the square. It is possible generally 
to do this in a large number of ways, for if we start 
with one solution of the problem, and rearrange the 
rows in it as whples, in any of the ways in which these 
may be arranged, a large number of new solutions will 
be found. With a 6x6 square the rows may be 
rearranged in 720 ways, including that from which 
we start, so we have at once a set of 720 solutions. 
Equally, or consecutively, we may arrange the 
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columns in 720 ways; and, finally, we may do the 
same with the treatments, while still conserving the 
property which the Latin square was designed to 
possess. The process of transformation will generate 
a number of different solutions which varies according 
to the particular square from which we happen to 
start. The smallest transformation set comprises 
144,000 solutions, while the 5 largest each comprise 
7,776,000. If a solution is chosen at random from 
such a set each plot *has an equal probability of 
receiving any of the possible treatments, and each 
pair of plots, not in the same row or column, has the 
same probability, namely one-fifth, of being treated 
alike. The process of randomisation, necessary to 
ensure the validity of the test of significance applied 
to the experiment, consists in choosing one at random 
out of the set of squares which can be generated from 
any chosen arrangement. 

The object of arranging plots in a Latin square 
is to eliminate from the experimental comparisons 
possible differences in fertility which may exist between 
whole rows of plots, and between whole columns 
of plots, as they stand in the field. The need for 
such a double elimination was particularly apparent 
to agricultural experimenters owing to the fact that 
in many fields there is found to .occur either a 
gradient of fertility across the whole area, or parallel 
strips of land having a higher or lower fertility than 
the average. But, for particular fields it is not known 
whether such heterogeneity will be more pronounced' 
in the one or the other direction in which the field 



ANALYSIS OF VARIANCE 81 

is ordinarily cultivated. Such soil variations may be 
due in part to the past history of the field, such as 
the lands in which it has been laid up for drainage 
producing variations in the depth and present 
condition of the soil, or to portions of it having been 
manured or cropped otherwise than the remainder ; 
but whatever the causes, the effects are sufficiently 
widespread to make apparent the importance of 
eliminating the major effects of soil heterogeneity, 
not only in one direction across the field, but at the 
same time in the direction at right angles to it. This 
double elimination is effected by the Latin square 
arrangement, which combines the combinatorial fact 
stated above, with the possibility of basing estimates 
of error upon an effective randomisation. 

32. The Estimation of Error 

As has been already illustrated in the experiment 

in • randomised blocks, the error will be properly 

estimated only if the same components of heterogeneity 

which have been successfully eliminated by the 

arrangement in the field, are also eliminated in the 

interpretation of the results in the laboratory. This 

elimination is carried out by an analysis of variance 

closely similar to that used in the last chapter. The 

35 independent comparisons possible among 36 yields 

give 35 degrees of freedom. Of these 5 are ascribable 

to differences between rows, and 5 differences 

between columns. Thus xo degrees of freedom serve 

to represent the components of heterogeneity, which 

have been eliminated in the field, and must be excluded 

F 
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from our estimate of error. Of the remaining 25 
degrees of freedom, 5 represent differences between 
the treatments tested, and 20 represent components 
of error which have not been eliminated, but which 
have been carefully randomised so as to ensure that 
they shall contribute no more and no less than their 
share to the errors of our experimental comparisons. 

The table shows the subdivision of the degrees of 
freedom for a 6 X 6 square, and in general for an 
sxs square used to test s treatments. 


Rows . 

TABLE 8 

6x6 square. 

5 

sXs square. 
s — I 

Columns 

5 

s— I 

Treatments 

5 

s — I 

Error . 

20 

1 

<■0 

T 

Total 

35 

s 2 — I 

Corresponding 

to each part 

of the degrees 


freedom the yield data of the experiment will provide 
a like portion of the total of what we have called sum 
of squares. The first 3 portions may be calculated 
in exactly the same way as are those for blocks and 
treatments in the randomised block arrangement. 
Thus, if the total yields in the rows are A, B, C, D, 
E, F, with a grand total M, while the corresponding 
mean yields are a, b, c, d, e, /, and tn, the portion of 
the sum of squares ascribable to rows is 

flA-f -f- cC dD -p ^E wM. 

The portions ascribable to columns and to treatments 
are calculated similarly, while that ascribable to error 
is calculated by deducting the 3 other items from the 
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total. This total sum of squares, as in other cases, is 
merely the sum of the squares of the deviations of the 
yields of all individual plots from their general mean 
and may be calculated by subtracting ;«\1 from the 
sum of the squares of these yields. 

33. Faulty Treatment of Square Designs 

When the possibility of effecting a double 
elimination of errors due to soil heterogeneity was 
first realised, the mistake was sometimes made of 
judging the precision of the results merely from the 
observed discrepancies between plots treated alike. 
This would be correct only if the whole 36 plots had 
been assigned at random, and without restriction, to 
the 6 treatments. Its effect is that the 10 degrees of 
freedom, corresponding to Rows and Columns are 
included in the estimate of error. Thus what experi- 
mental design had gained in the field arrangement 
w3s lost or thrown away in the statistical analysis. 
Indeed, by this method the apparent precision of the 
experiment, and the consequent reliance placed on 
its results, is less than if the treatments had been 
assigned wholely at random, disregarding rows and 
columns, for the large components due to these have 
been excluded from the 5 degrees of freedom ascribed 
to treatments, «nd therefore contribute more than 
proportionately to the 30 remaining degrees of 
freedom, which on this system is regarded as error. 

A fault of purely statistical origin also appears in 
some of the earlier work, namely, the use of the 
total number of plots, in place of the number of 
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degrees of freedom, as a divisor in obtaining the 
estimated variance of the mean square. This may 
lead to the error being seriously underestimated. 
Apart from its arithmetical simplicity, the great 
advantage of arranging the statistical work in an 
analysis of variance, lies in the safeguard it affords 
against errors of these two kinds. Once the degrees 
of freedom are subdivided it is apparent that the 
residue after allowing for Rows, Columns, and Treat- 
ments has only 20 degrees of freedom, and once the 
contributions to the sum of squares due to Rows and 
Columns are identified and set aside, no one would 
think of reintroducing these in attempting to arrive 
at an estimate of error. The mean square, obtained 
by dividing the residual sum of squares by the degrees 
of freedom available for the estimation of error, is a 
valid estimate of the variance of the yield of a single 
plot, due to the components of error which have been 
randomised. The sampling variance of the total of 
six plots having the same treatment is found by 
multiplying the mean square by 6, and that of the 
mean of such plots, by dividing it by 6. The 
variance so obtained is itself liable to sampling errors, 
dependent on the number of degrees of freedom on 
which it is based. Since this number is often small, 
we should use the exact s test for testing the signifi- 
cance of the group of treatments as a whole. If it is 
desired to compare any two particular treatments, 
or to make any other simple contrast among the 
treatments employed, whether based on the totals or 
on the means, we obtain the sampling variance appro- 
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priate to the expression, find the standard error by 
taking the square root, and the ratio of the differences 
to be tested to its appropriate standard error will be 
the value of t, appropriate to test its significance, 
with the same number of degrees of freedom as that 
on which the estimate of variance was originally based. 
Readers unfamiliar with the statistical procedure of 
these tests may be referred to Statistical Methods for 
Research Workers. 

In agricultural trials a single Latin square will 
frequently give precision high enough to reduce the 
standard error to less than 2 per cent, of the yield, and 
sometimes to less than 1 per cent. If experimentation 
were only concerned with the comparison of four to 
eight treatments or varieties, it would therefore be not 
merely the principal but almost the universal design 
employed. It is particularly fitted for the comparison 
at a number of different places of a small selected 
group of highly qualified varieties, the relation of 
which to varying conditions of soil and weather needs 
to be explored. Where it fails is to provide a means 
of testing simultaneously a large number of different 
treatments or varieties. The means used to obtain 
precision in such experiments will be developed in later 
chapters. 

34. Systematic Squares 

When the idea of effecting an elimination of 
errors due to soil heterogeneity in two directions at 
right angles was first appreciated, the necessity for 
randomisation in experimental trials was not realised. 
In consequence, certain systematic arrangements were 
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adopted. One of these, which may be called a 
diagonal square, is shown below : — 

A B C D E 

E A B C D 

D E A B C 

C D E A B 

B C D E A 

It will be observed that the conditions that one ‘plot 
of each kind lies in each row and one in each column 
are satisfied by this arrangement. The plots receiving 
treatment A are, however, all in a line along the 
diagonal of the square, and other lines parallel to this 
diagonal also receive the same treatment throughout 
their length. Consequently, there is ground to fear 
that if ridges or strips of fertility run obliquely across 
the rows and columns they may give to some of the 
treatments a systematic advantage compared with 
the others. In other words, the components of soil 
fertility in which the areas assigned to different 
treatments may differ, may, not improbably, be 
larger than the remaining components, representing 
differences between plots treated alike, on which 
the estimate of error is based. Consequently, if a 
systematic arrangement of this kind is treated as 
though it were a Latin square two- distinct effects 
may be anticipated. First, that the actual errors in 
the comparisons of the treatments will be greater than 
if a properly randomised Latin square had been 
used, and, second, that the estimate of error obtained 
from the experiment will be less, by reason of the 
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exclusion of the more important components of soil 
variability, which have been confounded with the 
treatments. 

The first of these dangers was readily recognised, 
though the second was ignored. Consequently, an 
improved systematic arrangement due to Knut Vile 
has been widely used. In this each row is moved 
forward two places instead of one, so that the arrange- 
ment is as follows : — 

A B C D E 

D E A B C 

B C D E A 

E A B C D 

C D E A B 

In this arrangement the areas bearing each treatment 
are nicely distributed over the experimental area, so 
as to exclude all probability that the more important 
components of soil heterogeneity should influence the 
comparison between treatments. This was clearly the 
intention of the arrangement ; but its fulfilment carries 
with it an unforeseen and unfortunate consequence. 
If, by the skill of the experimenter, the components of 
error, by which the comparisons between the treat- 
ments are affected are, on the average, less than those 
given by a random arrangement, it follows that those 
available for the estimation of error must be greater. 
This is easily seen by considering the subdivision of 
the sum of squares in the analysis of variance. The 
null hypothesis is that the treatments are without 
effect, and therefore that all the differences observed 
among the experimental results are due to experi- 
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mental error. They are therefore unaffected by the 
arrangement adopted. The components ascribable to 
rows and columns, and eliminated from the experi- 
ment, are also the same however the plots may be 
arranged. Consequently, the total of the two com- 
ponents ascribed to treatments and to error, that is 
to say, the true errors of our comparisons, and the 
estimate of these errors supplied by the experiment, 
have a total independent of the experimental arrange- 
ment. The sole effect of adopting one system of 
arrangement rather than another is on the manner in 
which the fixed total is divided between these two 
parts. The purpose of randomisation is to ensure 
that each degree of freedom shall have, on the null 
hypothesis, the same average content. Any method 
of arrangement, therefore, which diminishes the real 
errors must increase the apparent magnitude of these 
errors, by which the validity of the comparison is to 
be judged. The consequence is that not more, but 
less, reliance is placed, and must be placed on the 
results, as a consequence of the experimenter’s success 
in excluding the larger components of error from his 
comparisons. 

It should be noted that this unfortunate conse- 
quence only ensues when a method of diminishing 
the real errors is adopted, unaccompanied by their 
elimination in the statistical analysis. Thus, when the 
treatments are equally distributed among the rows of 
an experiment, the real error is usually diminished, 
and the estimate of error may be diminished in the 
same measure by eliminating the differences between 
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the different rows. The failure of systematic arrange- 
ments came from not recognising that the function 
of the experiment was not only to make an unbiased 
comparison, but to supply at the same time a valid 
test of its significance. This is vitiated equally 
whether the components affecting the comparisons 
are larger or smaller than those on which the estimate 
of error is based. The consequences of accepting 
an Insignificant effect as significant, or of rejecting 
as insignificant one which, with sounder methods of 
experimentation, would have shown itself to be 
significant, are equally unfortunate. In fact, the 
calculation of standard errors is idle and misleading, 
if the method of arrangement adopted fails to 
guarantee their validity, and the same applies to 
all other means of testing significance. 

The consequences surmised as to the effects of 
using the two systematic squares illustrated above 
have in fact been verified in detail by O. Tedin, by 
superimposing these arrangements, each 184 times, 
on the yields obtained in uniformity trials, in which 
the null hypothesis is known to be true ; and by 
comparing the results with those obtained using 
random arrangements. The discrepancies found are 
just what might have been anticipated on theoretical 
grounds. The • diagonal square gives larger real 
errors accompanied by greater apparent precision. 
The Knut Vik square gives lower real errors accom- 
panied by less apparent precision. It is a curious 
fact that the bias of the Knut Vik square, which was 
unsuspected, appears to be actually larger than that 
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of the diagonal square, which all experienced experi- 
menters would confidently recognise. 

35. Graeco-Latin and Higher Squares 
The number of arrangements in a Latin square 
is known for squares up to 6 X 6. Since the com- 
binatorial properties which these illustrate are useful 
in experimental design apart from their application 
to the double elimination of error, it is well to know 
something of them. The 2X2 square 

A B 
B A 

illustrates the fact that the three independent contrasts 
among 4 objects may be resolved into contrasts 
between pairs of them in the 3 ways in which such 
pairs can be chosen, such as the rows, columns, and 
letters of the square. Similarly, the 8 independent 
contrasts among 9 objects can be resolved into. 4 
independent sets of 2 degrees of freedom each, found 
by dividing the whole into 3 sets of 3 objects. This 
comes from the fact that not only is a 3X3 Latin 
square possible, but 33x3 Graeco-Latin square. A 
pair of letters, one Greek and one Latin, may be 
assigned to each cell of the square, so that each Latin 
letter appears once in each row and in each column, 
and each Greek letter appears once in each row, once 
in each column, and once with each Latin letter, as is 
shown below : — 

Aa B/3 Cy 

By Ca A f$ 

C/3 Ay Ba 
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By rearranging the rows among themselves and 
also the columns, so that the letters in the first row 
and in the first column are in order, any Latin square 
can be reduced to a standard form. Since, after 
rearranging the columns so that A is on the left of 
the first row, only the remaining rows will be dis- 
turbed, each square of the standard form is capable 
of generating ,r !(x— 1*)! different squares, where j is 
the number of letters in each row or column. When 
s is 2 or 3 there is only one solution in the standard 
position, so that the total numbers of Latin squares 
are 2 and 12 respectively. There is also essentially 
only one 3x3 Graeco-Latin square, namely that 
shown above; but apart from the rearrangement 
of the Latin letters in the rows and columns in 
12 ways, the Greek letters may be permuted among 
themselves in 6 ways, making 72 arrangements 
in all. 

With 4 X4 squares there are four arrangements 
of the Latin letters in the standard position or 
4x6x24 = 576 Latin squares. Only one of these 
4, however, yields a Graeco- Latin square, and that in 
two ways, so that there are 2 x 6 X 24 2 ( = 6912) 4x4 
Graeco- Latin squares. The two arrangements of the 
Greek letters are, moreover, themselves orthogonal, 
so that the 15 degrees of freedom among 16 
objects may be divided into 5 independent sets 
of 3 each, being the 3 independent comparisons 

among 4 sets of 4 objects each, into which 

• 

the whole may be divided. An arrangement of 
this kind is shown below, in which numeral 
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suffices are used in place of one set of Greek 


letters A ia 

b 2 £ 

Qr 

d 4 s 

B 4 y 

a 3 s 

D 2 a 

Cx /3 

c 2 s 

D x y 


B 3 a 

D 3 j 3 

C 4 a 

B x 8 

A 2 y 


There in all 2X6x 24 s such arrangements. 

The 5x5 Latin squares in the standard position 
are 56 in number, and fall into two sets. One set of 
50 yields no Graeco- Latin square, but the set c 5 f 6, 
which are symmetrical about the diagonal, yield each 
3 different squares which do not differ merely in a 
permutation of the Greek letters. I here are therefore 
3 x 6 X 24 X 1 20 2 5 x 5 Graeco- Latin squares. The three 
different arrangements are all mutually orthogonal, 
so that we may add a numeral suffix, as in the 4x4 
square above, and obtain 6x6x24x120 s solutions. 
And we may add a second suffix independent of the 
first, and of the letters, in 6X6X24X 120* different 
ways. An example using two suffices is shown 

below:— Ajtti B 2 / 3 2 C 3 y 3 D 4 S 4 E 5 e 5 

B 3 8 6 C 4 € 4 D s a 2 E x 03 A 2 y 4 

C 5 jS 4 D x y 5 E 2 S x A 3 c 2 B 4 a 3 

D 2 € 3 E 3 a 4 A 4 /J 5 B 5 y x 

E 4 y 2 A5S3 B x e 4 C 2 a 5 U 3 ^ x 

Consequently, the 24 degrees of freedom among 25 
objects can be subdivided into 6 independent sets of 
4 corresponding to the rows, columns, Latin letters, 
first suffices, Greek letters and second suffices in the 
square above. 

Although there are 9408 6x6 Latin squares in 
the standard position, belonging to 12 distinct types, 
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yet none of these yields a Grreco- Latin square, a 
conclusion arrived at by Euler after a considerable 
investigation, but only recently established for certain 
by the enumeration of the actual types which occur. 
Graeco- Latin squares are easily formed with 7 units 
in a side, or any other odd number, but beyond 6 the 
larger squares have not been investigated. It may 
be shown that with any prime number (p) the p-— 1 
degrees of freedom among p objects may be separated 
into p -\- 1 independent sets of p~ 1 degrees of freedom 
each, each representing comparisons among p groups 
of p objects. 

36. Practical Exercises 

For experimental purposes it is, of course, the 
properties of the smaller squares that are most useful. 
The following data supply an example for readers 

TABLE 9 


Arrangement ami Yields of a I.alin Square 


E 

B 

F 

A 

C 

1) 


633 

527 

652 

390 

504 

416 

3122 

B 

C 

D 

E 

F 

A 


489 

475 

415 

488 

57 . 

282 

2720 

A 

E 

C 

B 

D 

F 


384 

481 

483 

422 

334 

646 

2750 

F 

D 

E 

c 

A 

B 


620 

44 * 

5°5 

439 

323 

384 

2719 

D 

A 

B 

F 

E 

C 


452 

432 

411 

617 

594 

466 

2972 

C 

F* 

A 

D 

B 

E 


500 

505 

259 

366 

326 

420 

2376 

3078 

2868 

2725 

2722 

2652 

2614 

16659 
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who wish to familiarise themselves with the analysis 
of variance in its application to experiments designed 
in a Latin square. They represent the results of such 
an experiment carried out in potatoes at Ely m ! 93 2 _ 
The six treatments designated by ABCUb 
consist of different quantities of nitrogenous an 
phosphatic fertilisers. The results in lbs. of potatoes 
are quoted, with some simplification, from the 1932 
Report of Rothamsted Experimental Station. 

The following puzzle is of service in familiarising 
the mind with the combinatorial relationships under- 
lying the use of the Latin square, and the like, in 

experimental design : , 

Sixteen passengers on a liner discover that t ey 

are an exceptionally representative body. Four are 
Englishmen, four are Scots, four are Irish, and four 
are Welsh There are atso four each of four different 
ages, 35, 45, 55, and 65, and no two of the same age 
are of the same nationality. By profession also four 
are lawyers, four soldiers, four doctors, and four 
clergymen, and no two of the same profession are ot 
the same age, or of the same nationality. 

It appears, also, that four are bachelors, four 
married, four widowed, and four divorced, and that 
no two of the same marital status are of the same 
profession, or the same age, or the same nationality. 
Finally, four are conservatives, four liberals, four 
socialists, and four fascists, and no two of the same 
political sympathies are of the same marital status, 
or the same profession, or the same age, or the same 
nationality. 
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Three of the fascists are known to be an unmarried 
English lawyer of 65, a married Scotch soldier of 55, 
and a widowed Irish doctor of 45. It is then easy 
to specify the remaining fascist. 

It is further given that the Irish socialist is 35, 
the conservative of 45 is a Scotchman, and the 
Englishman of 55 is a clergyman. What do you 
know of the Welsh lawyer ? 
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THE FACTORIAL DESIGN IN 
experimentation 

37. The Single Factor 

In expositions of the scientific use of experimentation 
it is frequent to find an excessive stress laid on the 
importance of varying the essential conditions only 
one at a time. The experimenter interested in the 
causes which contribute to a certain effect is supposed, 
by a process of abstraction, to isolate these causes 
into a number of elementary ingredients, or factors 
and it is often supposed, at least for purposes ol 
exposition, that to establish controlled conditions m 
which all of these factors except one can be held 
constant, and then to study the effects of this smgle 
factor, is the essentially scientific approach to an 
experimental investigation. This ideal doctrine seems 
to be more nearly related to expositions of elementary 
physical theory than to laboratory practice in any 
branch of research. In experiments merely designed 
to illustrate or demonstrate simple laws, connecting 
cause and effect, the relationships ofi which with the 
laws relating to other causes are already known, it 
provides a means by which the student may apprehend 
the relationship, with which he is to familiarise himseli, 
in as simple a manner as possible. By contrast, in the 
state of knowledge or ignorance in which genuine 
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research, intended to advance knowledge, has to be 
carried on, this simple formula is not very helpful. 
We are usually ignorant which, out of innumerable 
possible factors, may prove ultimately to be the most 
important, though w r e may have strong presuppositions 
that some few of them are particularly worthy of study. 
We have usually no knowledge that any one factor 
will exert its effects independently of all others that 
can "be varied, or that its effects are particularly 
simply related to variations in these other factors. 
On the contrary, if single factors are chosen tor 
investigation, it is not because we anticipate that the 
laws of nature can be expressed w'ith any particular 
simplicity in terms of these variables, but because 
they are variables which can be controlled or measured 
with comparative ease. If the investigator, in these 
circumstances, confines his attention to any single 
factor, we may infer either that he is the unfortunate 
victim of a doctrinaire theory as to how experimenta- 
tion should proceed, or that the time, material, or 
equipment at his disposal are too limited to allow 
him to give attention to more than one narrow aspect 
of his problem. 

The modifications possible to any complicated 
apparatus, machine, or industrial process must always 
be considered as potentially interacting with one 
another, and must be judged by the probable effects of 
such interactions. If they have to be tested one at a 
time this is not because to do so is an ideal scientific 
procedure, but because to test them simultaneously 

would sometimes be too troublesome, or too costly. 

G 
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In many instances, as will be shown in this chapter, 
this impression is greatly exaggerated. Indeed, in a 
wide class of cases an experimental investigation, at 
the same time as it is made more comprehensive, may 
also be made more efficient, if, by more efficient we 
mean that more knowledge and a higher degree of 
precision are obtainable by the same number o 
observations. 

38. A Simple Factorial Scheme 

As an example, let us consider the case in which 
we require to study experimentally the effects of 
variations in composition of a mixture containing four 
active ingredients. It is indifferent for our illustra- 
tion for what purpose the mixture may be required. 

It may be an industrial product, a medicinal prescrip- 
tion, a food ration, or an artificial manure. It is 
sufficient that its efficacy in practical use cannot be 
calculated a priori, but can be measured by its effect 
in particular trials ; and that the ideal quantitative 
composition is unknown in respect to all four in- 
gredients. In principle, it matters little whether our 
doubts extend to wide variations in the quantities to 
be employed, or whether the variations in question are 
proportionately small ; as they will be when wide 
experience has already determined the ideal propor- 
tions within narrow limits. Nor will it affect the 
question in principle, if, as in some cases, the cost of 
the items is an important consideration to be debited 
in assessing the net advantage of using more of 
them; or whether, as in other cases, the direct 
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observations or measurements made in the test are 
alone to be considered. 

So defined, the situation is clearly one of very 
wide occurrence. Let us consider an experiment in 
which 16 different mixtures are made up in the 16 
combinations possible by combining either a larger 
or a smaller quantity of each of the 4 ingredients to 
be tested. The quantities may differ in some cases 
by a' factor as large as 2, in which case each mixture 
will contain of each ingredient either a single or a 
double quantity. The factor need not be the same 
for all ingredients. For some one or more of them 
we may doubt whether its presence in any quantity 
is desirable ; or whether it had not better be omitted 
altogether. We shall then test mixtures with or 
without this component. But, in any case, the 
general question as to whether, of each ingredient, 
more or less can be added with advantage, can be 
settled by making up all combinations, using either 
more or less of each component. 

We will suppose now that 6 tests are made with 
each mixture, or 96 in all. The particular cases in 
which the different mixtures are to be tried will be 
assigned strictly at random ; or, as in the agricultural 
experiment illustrated in Chapter IV. (randomised 
blocks), the tests will be divided into 6 series, each 
supposedly more homogeneous than the whole, and 
the 16 members of each will be assigned at random 
to different mixtures. Since no difference of principle 
arises here we will suppose that the tests are assigned 
entirely at random. 
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In respect of any one particular ingredient the 
first point to be noted is that we have for comparison 
48 cases in which a larger, and 48 in which a smaller 
quantity has been employed. These 48 are not all 
alike in other respects. They are alike in sets of 6, 
but the eight sets differ from each other in the other 
ingredients used. Corresponding to each set of 6 in 
which a larger quantity of the first ingredient has 
been used, there will, however, be a set of 6 with a 
smaller quantity, and exactly similar to it in respect 
of the other ingredients. The difference, if any, in 
the effects observed in these two sets must be ascribed, 
apart from random fluctuations, to the particular 
ingredient in which they differ. Moreover, there are 
8 such pairs of sets to supply confirmatory evidence 
of this effect if it exists. For each single factor, 
therefore, we have a direct comparison of the averages 
or totals of two sets of 48 trials ; and this comparison 
will have the same precision as if the whole of the -96 
trials had been devoted to testing the efficacy of one 
single component. The first fact contributing to the 
efficiency of experiments designed on the factorial 
system, is that every trial supplies information upon 
each of the mam questions, which the experiment is 
designed to examine. 

The advantage of the factorial arrangement over 
a series of experiments, each designed to test a single 
factor is, however, much greater than this. For with 
separate experiments we should obtain no light what- 
ever on the possible interactions of the different 
ingredients, even if we had gone to the labour of 
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performing 96 experiments with each of them, and 
so, for each singly, had attained the same precision 
as that which the factorial experiment can give. 
If, for example, an increase in ingredient A were 
advantageous in the presence of B, but were ineffective 
or disadvantageous in its absence, we could only 
hope to learn this fact by carrying out our test of A 
both in the presence and in the absence of B ; and this, 
in fact, is what the factorial experiment is designed 
to do — but to do so thoroughly that the total system 
of possible interactions is explored in its entirety. To 
test if the effect of A in the presence of B is greater 
than in its absence, we may compare the total 
difference ascribable to A from one set of 24 pairs 
of otherwise comparable trials, in all of which B 
is present, with the corresponding effect derived 
from the remaining pair of sets of 24, in all of 
which B is absent. This is the same as comparing 
the 1 results of 48 trials in which A and B are both 
employed in large, or both in small, quantity, 
with the other 48 trials in which the larger quantity 
of A and the smaller quantity of B, or vice versa, 
have been combined. We have thus again a com- 
parison between the results of two sets of 48 trials, 
comparable in all other respects, save that in which we 
are interested, namely, whether the effect of an increase 
in A is, or is not, influenced by an increase in B. 
This difference, if it exists, might equally be expressed 
by saying that an increase in B is influenced in its 
effect by an increase in A. The difference, in fact, 
involves the two ingredients symmetrically, and is 
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technically spoken of as the interaction of A with B. 
There are clearly 6 such interactions between pairs 
of the 4 ingredients, and each of these is evaluated 
by the experiment with the same precision. 

The 4 contrasts for single ingredients, and the 
6 interactions between pairs of them, still do not 
exhaust the possible interactions which may be 
present, and which the experiment is competent to 
reveal. It might be,, for example, that the inter- 
action between A and B is itself influenced by the 
quantity of the ingredient C. If we were to calculate 
the interaction of A and B for those mixtures which 
had a larger quantity of C, and subtract from this 
the corresponding interaction for mixtures having 
the smaller quantity of C, we should, m fact, be 
adding up all the results from mixtures having either 
all three or any one of the ingredients A, B and C 
present in the larger quantity, and subtracting from 
this total the effects of all the mixtures having either 
none, or any two. The measure of discrepancy is, 
therefore, symmetrically related to the three in- 
gredients A, B and C, and is known as the interaction 
of these three ingredients. If such an interaction 
exists the fact may be stated, as above, by saying 
that the interaction of A and B depends on the 
quantity of C present. Equally, we 'might make the 
equivalent statement that the interaction of B and C 
depends on the quantity of A present, or that the 
interaction of C and A depends on the quantity of B, 
The three statements are logically equivalent; and 
for any set of three ingredients there will be only one 
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numerical measure of the interaction to be ascertained 
from the data. Since there are four sets of three 
which we might choose, there are four triple inter- 
actions which can be evaluated, each of which, like 
the interactions between two ingredients, is found by 
the comparison of a set of 48 chosen tests, with 
another set of 48 in other respects comparable with it. 

Finally, if we ask whether the interaction of A, 
B arid C is dependent on the^ quantity of D present, 
it appears that the answer to our question must 
be a comparison symmetrically related to all four 
ingredients. This is made by comparing the effects 
of those mixtures in which there is a larger quantity 
of 4, 2 or none of the ingredients with the effects of 
those mixtures in which there is a larger quantity of 
3 or 1 of them. There is thus only one quadruple 
interaction in the system, and this will be evaluated 
with the same precision as all the others. We thus 
fintl that the 15 independent comparisons among the 
16 different mixtures, which have been made up, may 
be logically resolved into 15 intelligible components, 
as shown in the following table : — - 

TABLE 10 

Effects of single ingredients ... 4 

Interactions of 2 „ ... 6 

„ „ 3 .. ... 4 

„ ’ „ 4 ■_> 

Total . . -15 

The numbers are the coefficients of the binomial 
expansion (i+2r) 4 , omitting the first. Each particular 
interaction may be clearly designated by the selection 
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of letters, such as ABC which represent the ingredients 
contributing to it. With respect to sign it is a con- 
venient convention to speak of an interaction as 
positive when the measured effects of the set of 
mixtures, which contain all the ingredierits involved 
in larger quantity, exceeds the measured effects of the 
remaining set. 

The second advantage, which we may note, 
therefore, in a factorially arranged experiment, is 
that, in addition to measuring the effects of the four 
single ingredients with the same precision as though 
the whole of the experiment had been devoted to each 
of them, it measures also the 1 1 possible interactions 
between these ingredients with the same precision. 
These interactions may, or may not, be considerable 
in magnitude. It is none the less of. importance in 
practical cases to know whether they are considerable 
or not. 

The precision with which the 15 comparisons have 
been made is, of course, estimated from the variation 
between the results of the 6 trials of each mixture. 
Each mixture will, therefore, provide 5 degrees of 
freedom for making a pooled estimate of the precision 
of the experiment, thus giving 80 degrees of freedom 
in all. The analysis of variance of the 96 trials thus 
takes the simple form : — 

TABLE n 

Degrees of 
Freedom. 

Treatments ... 15 

Error .... 80 

Total 95 
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The sum of squares corresponding to error is divided 
by 80 to obtain the estimated variance of a single 
experiment. This, multiplied by 96, gives the 
estimated variance of the difference between the sum 
of any chosen set of 48 results, and the sum of the 
remainder. Thus we have only to add one-fifth of 
its value to the sum of squares for error, and take the 
square root, to find the standard deviation appropriate 
to each of the 15 comparisons that have been made, 
and in relation to which the significance of each may 
be judged. Since 80 is a relatively ample number of 
degrees of freedom for the estimation of error, those 
differences may conveniently be judged significant, 
which exceed twice their standard errors. 

Had the experiment been arranged, for greater 
precision, in 6 blocks of trials, we should in the 
analysis eliminate the differences between these blocks 
from the estimate of error. We should then have : — 


TABLE 12 



Degrees of 


Freedom. 

Blocks 

5 

Treatments 

15 

Error 

75 

Total 

95 


leaving 75 degrees of freedom for the estimation of 
error ; so that the sum of squares for error must be 
multiplied by 96/75 or 1-28 to obtain the variance 
of any comparison between 48 chosen tests, and the 
remaining 48. 
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39. The Basis of Inductive Inference 

We have seen that the factorial arrangement 
possesses two advantages over experiments involving 
only single factors : (i) Greater efficiency, lh that these 
factors are evaluated with the same precision by means 
of only a quarter of the number of observations that 
would otherwise be necessary ; and, (ii) Greater 
comprehensiveness in that, in addition to the 4 effects 
of single factors, their 1 1 possible interactions are 
evaluated. There is a third advantage which, while 
less obvious than the former two, has an important 
bearing upon the utility of the experimental results 
in their practical application. This is that any 
conclusion, such as that it is advantageous to increase 
the quantity of a given ingredient* has a wider 
inductive basis when inferred from an experiment 
in which the quantities of other ingredients have 
been varied, than it would have from any amodnt 
of experimentation, in which these had been kept 
strictly constant. The exact standardisation of 
experimental conditions, which is often thoughtlessly 
advocated as a panacea, always carries with it the 
real disadvantage that a "highly standardised experi- 
ment supplies direct information only in respect 
of the narrow range of conditions achieved by 
standardisation. Standardisation, therefore, weakens 
rather than strengthens our ground for inferring a 
like result, when, as is invariably the case in practice, 
these conditions are somewhat varied. As the analysis 
of variance clearly shows, such standardisation of 
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conditions is only of value in increasing 1 the precision 
of the experimental data when it is applied to the 
tests to be compared experimentally, differences 
between which affect the real errors, and the estimates 
of error With which they are randomised. Such 
standardisation is out of place when applied to 
parallel tests designed, by multiplying the observa- 
tions, to increase the precision of all comparisons. 
In fact, as the factorial arrangement well illustrates, 
we may, by deliberately varying in each case some 
of the conditions of the experiment, achieve a wider 
inductive basis for our conclusions, without in any 
degree impairing their precision. 

40. Inclusion of Subsidiary Factors 

Factorial types of arrangement, by reconciling 
the desiderata of a relatively wide exploration with 
high precision, without sacrifice of either advantage, 
are eminently suitable when alternative procedures of 
any kind are apparently open to the experimenter. 
Thus it may be doubtful whether the cultivation of 
an orchard is most advantageously carried out in the 
spring or the autumn of the year. More than one 
method of cultivation, also, may be advocated with 
apparently equal force. Numerous similar examples 
will occur when the modification of any practical 
procedure is under consideration. Schools of opinion 
are formed on points which, without systematic trial 
we cannot know to have, or to lack, real importance, 
as bee-keepers dispute whether the combs should lie 
parallel from the front of the hive to the back, or 
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should lie transversely to this direction. These are all 
qualitative differences, but, provided that there is a 
quantitative method of measuring any real advantage 
which they may confer, they may be incorporated in 
experiments designed primarily to test other points ; 
with the real advantages that, if either general effects 
or interactions are detected, that will be so much 
knowledge gained at no expense to the other objects 
of the experiment ; and that, in any case, there' will 
be no reason for rejecting the experimental results on 
the ground that the test was made in conditions 
differing in one or other of these respects from those 
in which it is intended to apply the results. 

There is another feature widespread in experi- 
mental work where the factorial arrangement relieves 
the investigator of a troublesome cause of anxiety ; 
namely, that the isolation of a single factor for 
separate experimentation can often be achieved only 
at the expense of a somewhat questionable arbitrariness 
of definition. To consider a very simple case, the 
experimenter may undertake to test the respective 
yields of two or more varieties of a cereal crop. At 
first sight it will readily be admitted that all other 
factors likely to affect the yield shall be made the 
same for the varieties to be compared. So the 
quantity of seed sown must be equalised. But if the 
seed-rates are measured in bushels per acre, the 
bushels, measured as equal volumes of seed of the 
different varieties, may differ in their weight; or,, 
again, they may differ even more conspicuously in the 
numbers of seeds which they contain. If the test is 
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to be carried out at the same single seed-rate for each 
variety, the equality of seed-rate must be defined, 
for the purpose of the test, with some degree of 
arbitrariness. Unless we have assurance that varia- 
tions in seed-rate will not affect the yield obtained, 
this arbitrariness will infect the results of the 
experiment. Clearly, on the question at issue, as 
between the varieties, it would be desirable to make 
a comparison using for each , variety .that seed-rate 
which is most profitable for it. This may differ from 
equality, as measured by volume, or by weight, or 
by number of seeds to the acre, by reason of the 
different capacities of the varieties to withstand causes 
of death, or, by forming additional shoots, to fill 
up vacant spaces. The experimenter who, in testing 
the varieties, at the same time makes a sufficient 
variation in seed-rates to embrace the optimal value, 
is clearly in a better position to meet the criticisms 
which may arise from these considerations, than is 
one who adopts any arbitrary conventions as to what 
the phrase “ equality of conditions ” is intended to 
convey. 

In the conditions of cereal cultivation, moreover, 
variations in seed-rate inevitably raise with them 
further questions. In particular, what space should 
be left between the seed-rows or drills ? At a heavier 
seed-rate it is reasonable to suppose that the drills 
could with advantage be placed nearer together than 
tjhey could if less corn were sown. Consequently, 
the question as to what is the most advantageous 
seed-rate can never be answered experimentally 
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without a simultaneous variation in the width of 
the drills. Equally, no investigation of the question 
of drill-width can be satisfactory unless the amount 
of seed sown be also varied. The simple question, 
therefore, of the comparison of the yields of different 
varieties of the same crop carries with it the simul- 
taneous investigation of the effects of variations in 
other items of agricultural procedure. These do not, 
however, if 3 logical, and comprehensive plan of 
experimentation be adopted, add to the cost or 
labour of an effective experimental programme, bor 
extensive replication of the plots sown is in any case 
a necessity, if accurate results are to be attained , 
and this replication, at the same time as it increases 
the precision of comparisons, may be used simul- 
taneously to supply the desired variations in all 
conditions likely to be bound up with that which is 
the primary object of the investigation. 

For a given number of trials the more experimental 
variants are tried the fewer will be the absolute 
replicates. Thus, with 96 trials, we may have 6 
absolute replicates of 16 different experimental vari- 
ants, and these, as we have seen, will still leave 80 
degrees of freedom for the estimation of error. With 
the same material we might test 48 different mixtures, 
or treatments, and still have duplicat6 results for each. 
There would then be 48 degrees of freedom left for 
error. Although each test is only made in duplicate, 
yet all the primary questions, into which the difference? 
among them may be resolved, are answered with the 
same precision as though the whole experiment had 
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been devoted to each of these questions alone ; the 
loss of absolute replication is made good by the 
hidden replication inherent in the factorial arrange- 
ment. The experimental error is no larger, although, 
being based on only 48 independent comparisons, it 
is known with a slightly lower precision, than if 80 
had been available. With factorial experiments 
designed to make a large number of comparisons 
there" will, in fact, usually be an ample number of 
degrees of freedom for the estimation of error. 

41. Experiments without Replication 

It may occasionally be desirable to dispense with 
absolute replication altogether. This occurs when it 
is required to test a large number of combinations 
simultaneously without enlarging the experiment so 
greatly as to make repeated use of each combination ; 
and especially, when there is reason to believe that 
most of the interactions involving 3 or more factors 
will be unimportant experimentally, in the sense that 
their real effects, if any, will be too small to be 
statistically significant, in an experiment of the size 
contemplated. In such cases the whole of the 
independent comparisons which the experiment pro- 
vides may be assigned to the factors tested and their 
interactions. There will in fact be none ascribable to 
pure error, but there will be numerous interactions 
the apparent effects of which are principally due 
to error, and these may be used to provide a 
measure of the precision of the more important 
comparisons. 
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For example, we may have 6 factors each 
providing two alternatives • to be tested; and though 
we do not know a priori that these factors are 
unrelated, we may have reason to think that their 
action should be sufficiently nearly independent for 
all but the simple interactions between pairs of them 
to be experimentally negligible. If each of the 
combinations be tried once, the 63 independent 
comparisons which the experiment provides may 
be analysed as in the table below : — 


TABLE 13 

Degrees 

of 


Freedom. 

Single factors .... 

6 


Interaction between 2 factors 

15 


„ „ 3 „ 

■ 2 °1 


„ „ 4 „ 

„ ,, 5 

• >sl 
6 I 

Error 42 

,, », 6 it 

iJ 



The 6 primary effects of the individual factors 
will each be determined, as we have seen, by the 
whole weight of the evidence of 64 trials. To test the 
significance of the differences observed, we may use 
the 42 degrees of freedom for interactions involving 
more than 2 factors to supply an estimate of error; 
that is, an estimate of the variance, due to error, 
•of a single trial. The same test may be applied to 
any of the 15 interactions between ‘pairs of factors 
which may seem to be possibly significant. If none 
of these are very important compared with the 
average of the remainder we have an empirical 
confirmation of the supposition upon which the 
experiment was designed, that the 6 primary factors 
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are not strongly related. If, however, contrary to 
expectation, it appeared that there was a large inter- 
action between two of these factors, it would be 
advisable to examine separately the interactions 
between 3 factors which involve these two. We 
should thus pick out 4 particular suspects out of the 
42 degrees of freedom provisionally ascribed to error, 
leaving 38 in comparison with which their significance 
could' be tested. 

Such a plan could, of course, fail ; and would do 
so if a large number of the high order interactions 
were more important than the primary effects. It 
would, however, not likely be tried in these circum- 
stances. More frequently we should find that the 
true error had been but slightly inflated, and the 
effective precision of the experiment slightly reduced, 
by the inclusion in the estimate of error of some small 
components really due to interactions of the factors 
tested. 
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42. The Problem of Controlling Heterogeneity 

It has been shown in the 'last chapter that great 
advantages may be obtained by testing experimentally 
an aggregate of variants, systematically arranged on 
the factorial scheme. The illustrations have shown 
that such aggregates may be very numerous When, 
in Chapters III. and IV., the advantages of pairing, 
or of grouping, the material in relatively homogeneous 
blocks was discussed, it was seen that t e precision 
attainable by a given amount of experimentation was 
liable to be reduced, when the number of comparisons 
to be made was large, by reason of the increased 
heterogeneity, which must in practice then be per- 
mitted, among the tests in the same group. 

In agricultural experimentation this effect ex- 
presses itself very simply in the increased size of the 
blocks of land, each of which is to contain plots 
representative of all the different combinations to be 
tested Thus if there are 48 different combinations, 
each block will have to be nearly an acre in extent, 
and it is common experience that within so large an 
area considerably greater soil heterogeneity will be 
found, than would be the case if the blocks could be 
reduced in size to a quarter of an acre or less. The 
same consideration applies to experimentation of all 
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kinds. If large quantities of material are needed, or 
large numbers of laboratory animals, these will almost 
invariably be more heterogeneous than smaller lots 
could be made to be. In like manner, extensive 
compilations of statistical material often show evidence 
of such heterogeneity among the several parts which 
have been assembled, . and are seriously injured in 
value, if this heterogeneity is overlooked in making 
the compilation. 

In many fields of experimentation quantitative 
knowledge is lacking as to the degree of heterogeneity 
to be anticipated in batches of material of different 
size, or drawn from more or less diverse sources. 
This is a drawback to precise planning, which 
increased care in experimental design will doubtless 
steadily remove. While, therefore, greater hetero- 
geneity is always, on general principles, to be 
anticipated, when the scope of an experimental 
investigation is to be enlarged, this feature will often 
do but little to annul the advantages discussed in the 
last chapter. Nevertheless, the means by which such 
heterogeneity can be controlled are widely applicable, 
and will generally give a further increase in pre- 
cision. In agricultural field trials, where the study 
of heterogeneity has been itself the object of 
great deal of deliberate investigation, it is certain 
that the further advantages to be gained are very 
considerable. 

In the last chapter, we have seen that a factorially 
arranged experiment supplies information on a large 
number of experimental comparisons. Some of these, 
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such as the effects of single factors, will always be of 
interest. It is seldom, too, that we should be willing 
to forego knowledge of any interactions which may 
exist between pairs of these factors. But, in the case 
of interactions involving 3 factors or more, the 
position is often somewhat different. Such inter- 
actions may with reason be deemed of little experi- 
mental value, either because the experimenter is 
confident that they are quantitatively unimportant, 
or because, if they were known to exist, there would 
be no'immediate prospect of the fact being utilised. 
In such cases we may usefully adopt the artifice 
known as “ confounding.” This consists of increasing 
the number of blocks, or groups of relatively homo- 
geneous material, beyond the number of replications 
in the experiment, so that each replication occupies 
two or more blocks ; and, at the same time, arranging 
that the experimental contrasts between the different 
blocks within each replication shall be contracts 
between unimportant interactions, the study of which 
the experimenter is willing to sacrifice, for the sake 
of increasing the precision of the remaining contrasts, 
in which he is specially interested. To do this it 
must be possible to evaluate these remaining contrasts 
solely by comparisons within the blocks. It is not 
necessary, however, that comparisons* within any one 
block should provide the required contrast, but only 
that it should be possible to build this up by com- 
parisons within all the blocks of a replication. 
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43. Example with 8 Treatments, Notation 

A very simple example of confounding suggests 
itself when we have 3 factors, of only 2 variants 
each, and unite them in an experiment involving 
the 8 combinations, which they provide. It was in 
such an experiment that the principle of confounding 
was first used. As a convenient notation for these 
cases' we may call the 3 factory A, B and C. Let us 
choose to regard one of these 8 variants as a standard, 
or “ control,” and denote it by the symbol (1). The 
variant which differs from this only in the factor A 
we will denote by (a), likewise (b) and (c) will stand 
for the two other treatments which differ from the 
control in respect of the factors B and C. The 
remaining treatments will differ from the control in 
either two, or in all three of the factors used, and may 
therefore be denoted unequivocally by the symbols 
(ad), (ac), (be) and (abc). The same symbols may be 
used for the treatments, and for the quantitative 
measures of the results of these treatments, which the 
experiment is designed to ascertain. Thus (ab) will 
stand either for a particular treatment applied to an 
agricultural crop, or to the total yield of the plots 
which have received this treatment. Equally, it might 
be the average live weight, or the average longevity, 
of experimental animals which have been treated in 
this way, if the experiment is aimed at studying the 
conditions which influence weight or longevity. 

In contradistinction to the treatments, or experi- 
mental variants, denoted by such symbols as (a) we 
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shall use A, B, etc., to denote the experimental 
contrasts found for the factors, and for their inter- 
actions. Thus A will stand for the factor A and for 
its effect as measured by summing the treatments the 
symbols of which contain a, and deducting those 
which do not; i.e., 

A = (abc)-\- (ab)-\- (ac) + («)-“- — 

or, by analogy with .the rules of multiplication of 
algebraic quantities 

A = (a— i)(£+i)(y-P i). 

B = ( b — i)(<2-b i)(/T i), 

C = (c — 1)(<2+ 0- 

The corresponding expressions for the interactions 
between pairs of factors are then easily seen to be 

AB = (a — i)(£ — i)(r-f-i), 

BC = 0— i)(r— OO+O. 

and 

CA = (c— i)(a— 1)(£+ 1). 

Finally, the interaction of all 3 factors has the 
symbolic expression 

ABC = (a — i)(J > — i)(r — 1 ). 

• 

The purpose of the symbolism is to make it easy to 
denote any particular contrast, or interaction, and to 
ascertain at once how the treatments should be com- 
pounded in evaluating it. 


Similarly, 

and 
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44. Design suited to Confounding the Triple Interaction 

Such a set of eight treatments might be not 
inconveniently tested in groups of eight experimental 
trials, on relatively homogeneous material, or on 
blocks of land, each containing 8 plots. If, however, 
we decide in advance that the whole value of the 
experiment lies in the simple contrasts A, B, C, and 
in the interactions between pairs of factors, AB, 
BC, and CA, while the interaction between all 3 
factors, ABC, is unimportant and may be neglected, 
then it is possible to divide the land into blocks of 
only four plots each, or, in general, to subdivide the 
experimental material in groups of four, choosing thus 
more homogeneous groups than could be obtained 
if 8 had to be included in each group. To do this 
we notice that the particular interaction ABC, which 
we are willing to sacrifice, is a simple contrast between 
one particular four of our eight treatments, namely, 

(abe) + (a)^-(b) -{-(V), 
and the remaining four 

(ab) + (be) -f (ft * ) + ( 1 ), 

as in Fig. 1. 


to 

{abc) 

(a) 

{ 6 ) 



Fig. 1. — Diagram showing the arrangement of eight treatments in 
two complementary blocks belonging to the same replication. 
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If, therefore, these sets of four treatments are 
grouped together in blocks, and two such blocks be 
assigned to each replication, the contrast between the 
blocks within each replication is merely the treatment 
contrast which we are willing to sacrifice. This 
contrast has, therefore, been confounded indistinguish- 
ably with the differences between blocks, which are 
intended to be eliminated from the experimental errors 
of the comparisons we, wish to make, and from the 
estimate of these errors. The remaining contrasts, 
representing single factors, or interactions between 
pairs of them, though none of them can be made by 
comparisons within a single block, are all built up 
by combining a contrast of one pair of treatments 
and another in the same block, with a similar contrast 
inside the other block of the replication. The reader 
should satisfy himself that this is so by examining 
each of these contrasts. It is then apparent that the 
errors to which these contrasts are subject arise solfely 
from heterogeneity within the sets of four trials 
constituting the blocks, and that the differences 
between different blocks contribute nothing to the 
experimental error. By confounding the one unim- 
portant contrast with the differences between blocks, 
it is therefore possible to evaluate the six more 
important contrasts with whatever added precision 
is attained by using more homogeneous material. 

45. Effect on Analysis of Variance 

As an example, let us suppose such a test were 
carried out with 5 replications. There would then be 
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ten blocks, and 9 degrees of freedom belong to the 
contrasts between these blocks, which have been 
eliminated from the experimental error. Of these 9, 
one may be identified with the treatment contrast 
which has been confounded. The three independent 
comparisons, within each of the ten blocks of 4, 
must be assigned 30 degrees of freedom. Of these, 6 
stand, for the treatment contrasts evaluated, and the 
remaining 24 for the experimental error available 
for estimating the precision of the experiment, and 
for testing the significance of any particular result. 
The analysis of any such experiment is, therefore, 
of the form given below : — 


Table 14 

Degrees of 


Freedom. 

Blocks . . . . . 

9 

Treatments . . . . 

6 

Error . . . . . 

• 24 

Total 

39 


It is often instructive, and affords a useful check in 
more confusing examples, to see how the components 
ascribable to error may be obtained independently, 
rather than only by subtraction from the total. In 
this case there are two sets of five blocks, each 
containing identical treatments. The three differences 
among these treatments have, therefore, each been 
evaluated 5 times, and the 4 discrepancies between 
these 5 values will give 12 differences due wholly to 
9rror. Equally, the other set of five blocks contribute 
the other 12 degrees of freedom or error, making 24 in 
all, the total required. 
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Without confounding, the analysis of the experi- 
ment would read : - 

Table 15 

Blocks . 

Treatments 

Error 

Total - ’ 

SO that the effect of the subdivision into block's has 
been to eliminate 5 additional degrees of freedom, one 
from the treatments, and four from the error. I 
greater homogeneity has in fact been obtained from 
the subdivision, the components by which the error 
has been diminished will have carried away a dis- 
proportionate share of the residual variation. 

The subdivision of each replication into two or 
more blocks does not prevent, when this is desired, 
the isolation, among the degrees of freedom assigned 
to error, of the particular components of error which 
affect any chosen comparison within the blocks. 
Since, however, the comparisons in which we are 
interested, such as A, are built up of comparisons 
within blocks of different kinds, they are equally 
affected by the components of error within each 
kind of block. Thus, in our present example, 
instead of 4 degrees of freedom only being available 
for the estimation of error of the comparison A, 
8 are available, and these 8 are the same as 
affect the precision of the interaction BC. Thu? 
the 24 degrees of freedom for error are divisible 
into three sets of 8 each, appertaining to three 


Degrees of 
F reedom. 

4 

7 

28 
• 39 
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23 


pairs of comparisons among the degrees of freedom 
ascribed to treatments. 

46. Example with 27 Treatments 

The principle of procedure illustrated in the last 
example may be extended and generalised in a large 
number of ways. Since only a few of these can be 
exemplified, the reader will find great advantage in 
investigating the possibilities of similar designs appro- 
priate to the special problems in which he is interested. 
The variety of the subject is, in fact, unlimited, and 
probably many valuable possibilities remain to be 
discovered. We will consider next an experiment 
with 3 factors, in which each furnishes not two 
but three variants, so that there arc in all 27 
combinations to be investigated. Thus a large scale 
investigation of the manurial requirements of young 
rubber plantations, in respect to the three primary 
manurial elements, nitrogen, potassium, and phos- 
phorus, combines, in addition to whatever basal 
treatment may be thought desirable, single or double 
applications of these three manures ; making in all 
three levels for each ingredient, such as nitrogen only, 
nine combinations for any two ingredients, and 27 
for all three together. 

In order to reduce the size of the block below that 
needed to contain 27 plots, we have to guide us in 
choosing a smaller size, the fact that blocks of nine 
will in any case be needed, if all the interactions 
between pairs of factors are to be conserved. The 
experiment will therefore have nine plots to a block, 



CONFOUNDING 


124 

and three blocks in each complete replication. Every- 
thing now depends on the choice of the sets ,pf nine 
treatments which are to be assigned to the same 
blocks, and, of course, within each block to strictly 
randomised positions. In order to conserve the main 
effects and the interactions between pairs, a set of 
nine treatments chosen to occupy the same block, 
must fulfil the following requirements : — 

(i) the three levels of each ingredient must be 

represented by three plots each, 

(ii) the nine combinations of each pair of in- 

gredients must be represented by one plot 
each. 

If the set satisfies the second group of conditions it 
satisfies also the first. It is not, however, at first sight 
obvious that the second condition can be fulfilled at 
once for all three pairs of factors. The combinatorial 
relationship exhibited in the Latin square may here 

be applied most valuably. 

Let us set out the nine combinations to be chosen 
in a diagrammatic square with three rows and three 
columns, and let us lay it down that treatments in the 
first row shall receive nitrogen at the first level, 
treatments in the second row at the second level, 
and treatments in the third row at- the third level. 
Then the requirement that the three levels of nitrogen 
shall be equally represented in our selection is 
satisfied easily by having three plots in each row of 
our diagram. Similarly, we may lay it down that 
the columns of our square correspond to the levels of 
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abundance of the second ingredient, potash, in the 
manurial mixture to be tested. It is then clear that 
we must have three plots in each column, and if the 
interactions of nitrogen and potash are to be conserved, 
that there must be one plot at the intersection of each 
row with each column. With respect to the level at 
which phosphate is applied to any plot, we cannot 
now represent it by its position in our diagrammatic 
square, but shall simply use. the numbers 1, 2, 3 
inserted in any position in the diagram to represent 
the level of the phosphatic ingredient. 

It now appears that a selection of nine treatments 
will satisfy the conditions laid down above, if it can 
be represented diagrammatically by a square contain- 
ing nine numbers at the intersections of the three 
rows with the three columns, of which every row 
must contain 1, 2 and 3 once each, in order that 
the interaction of nitrogen and phosphorus should be 
conserved ; and every column equally must contain 
a “ 1,” a “ 2 ” and a “ 3,” to make sure of conserving 
the interaction of potash and phosphorus. We have 
in fact merely to arrange the numbers 1, 2, 3 in a 
Latin square in order to obtain a single selection of 
the treatments, which might properly occupy a single 
block. 

*1 

*2 

«3 

There are only 12 solutions of the 3x3 Latin square. 
If we choose one of these to represent the contents of 
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one block, we must next enquire whether any selec- 
tion of treatments to occupy the other blocks in the 
same replications can be made to satisfy the con- 
ditions. We may convince ourselves on this point, 
by considering the effect on our chosen selection 
of making a cyclic substitution of the levels of 
phosphate; that is by substituting 2 for 1, 3 for 2, 
and 1 for 3 throughout the diagram. Repeating such 
a substitution three times will clearly bring us back 
to the original selection ; but the two new selections 
first produced will (i) both be represented by Latin 
squares, and (ii) will between them and the original 
from which they were derived contain all 27 treat- 
ments. This last essential fact becomes clear on 
perceiving that the number in any particular cell of 
the square must take the values 1, 2, 3 on successive 
applications of the substitution, whatever may be the 
initial value, while the aggregate of the 27 treatments 
used are represented simply by these three numbers, 
placed at all the nine points of the diagram. 

The 26 independent comparisons among 27 treat- 
ments may be analysed according to the factors 
involved, as in the table below : — 

TABLE 16 Degrees of 

Freedom. 


N ...... 2 

2 

P 2 

NK 4 

4 

NP 4 

NPK 8 


Total 


f 26 
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If, therefore, we make up the contents of the block in 
accordance with the solution provided by the Latin 
square diagram, we shall have sacrificed a par- 
ticular 2 out of the 8 degrees of freedom for triple 
interaction. 

It is, of course, always easy to recognise the 
particular components of treatment in which blocks 
in the same replication differ, and to obtain the 
aggregate sum of squares, for the six triple 
interactions which have not been confounded, by 
subtraction. This residue of unconfounded inter- 
actions may then be tested for significance like other 
treatment effects, and will usually be of service in 
confirming experimentally the supposition upon 
which the experimental design was based, namely, 
that the triple interactions as a whole had not been 
quantitatively important. 

There is, however, a certain advantage in being 
abhe to recognise which particular contrasts these 
unconfounded interactions represent. For then we 
can, if we wish, subdivide them to examine the 
significance of more particular effects. In the case 
of the design under discussion, based on a 3x3 
Latin square, the combinatorial properties of such a 
square are such as to make this recognition easy. It 
has been mentioned above that there are only twelve 
3x3 squares, and we have seen that each belongs to 
a set of 3 which can be generated from it by a cyclic 
substitution. There are, therefore, only 4 such sets, 
and the 4 squares below are representatives chosen 
one from each set. 
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I. II. III- 

123 1 3 2 123 

231 3 2 1 3 1 2 

312 213 231 

It may be observed that the second representative 
is formed from the first by interchanging the numbers 
2 and 3 ; the third is formed from the first by inter- 
changing the second and third rows, and the fourth 
is formed from the first by interchanging the second 
and third columns. If we consider, now, examples 
Nos. I. and II. it is to be observed that they agree 
in the three plots having phosphate at level No. 1, 
but differ in the other six. If, however, we applied 
the cyclic substitution to the first example, it would, • 
after one operation, agree with the second example in 
the three plots at phosphate level No. 3, and differ in 
the other six ; while after a second operation it would 
agree only in the three plots at phosphate level No; 2. 
Consequently, the nine treatments in any selection of 
set 1 1 . appear three each in the three selections of set I ., 
these sets of treatments being those having the same 
quantity of phosphate. The treatment comparisons 
represented by the subdivision of the 27 treatments 
as in set II. are therefore wholly independent of the 
treatment comparisons of set I. Both represent 2 
degrees of freedom out of the 8 available for triple 
interaction, and these 2 pairs of degrees of freedom 
have nothing in common. Consequently, when the 
pair of degrees of freedom of set I. are confounded, 
the pair represented by set II. are wholly conserved. 


IV. 

1 3 2 

2 1 3 

* 

3 2 1 
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The same relationship subsists between set III. 
and set I. if we consider the treatments represented 
in the same row, i.c., those of an equal level ol 
nitrogen, instead of those represented by the same 
number, at an equal level of phosphate. rite 

examples shown above agree ... the hrst row an 
differ in the two other rows. If *e cycl.e sub- 
stitution be applied to the first example tt w.l . g ee 
with the third successively in .the second anti third 
rows, while always differing m the remaining two. 
Consequently, the third set, like the second, represents 
a treatment contrast wholly independent of tf < 
represented by the first, and which is therefore entirely 
conserved if the latter is confounded In hi* ~ 
the treatments in any selection of set • 
distributed by threes in the selections o se •, 
these threes lying in the same column and having 

therefore equal quantities of potash. - 

of freedom of set IV. are also therefore wholly con- 
served. It should be noticed further that the three 

sets II., III. and IV. -are not only independent o 

set I , but are also independent of each other. 

i ntte examp.es S hown,ll.andI.I.a g re=.nas m g'e 

column, II. and IV. in a single row, and HI. and 
IV in a single number ; and, m view of what has 
been said above.these facts suffice to show that the 
pairs of degrees of freedom represented by these sets 

ar wholly independent and have nothmg » commom 
since all are included m the 6 degrees of freedom 
conserved, they must therefore eonst.tute the whole 
of the 6 degrees of freedom, and eonst.tute parts 
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which may be separated in the analysis, and examined 

separately in the test of significance. 

Supposing, then, that the experiment were came 
out with 12 replications, or, in all, 324 plots, we mig t 
choose one of the 4 P^rs of degrees of freedom into • 
which the 8 triple interactions have been divided, and 
decide to sacrifice these particular components of the 
triple interactions, in order to increase the precision 
of the comparisons to be made in the 6 components 
of single factor effects, the 12 components of inter- 
actions between pairs of factors, and the 6 components 
of triple interactions which have not been confounded. 
The experiment then consists of 36 blocks o 9 p ots 
each, so that 35 degrees of freedom are eliminated as 
representing block differences. The sets of treatments 
in different bldcks within the same replication are 
assigned by using one of the cyclic sets of 3 x 3 Latin 
squares ; remembering, of course, that topographical y 
these treatments are not arranged in a Latin square, 
but are assigned at random to the 9 plots in t 
block. Each set of 9 treatments replicated 12 times 
will provide 88 degrees of freedom for error, or 264 
in all, so that the complete analysis of the experiment 
may be shortly represented as below : 


Table 17 

Degrees of 

Blocks 

Freedom. 

35 

6 

Single factors . • 

12 

Interactions between pairs • 

Unconfounded triple interactions 

6 

264 

Error • ••*'’ 

— 

Total • 

323 
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47. Partial Confounding 

In the example we have just considered with 27 
reatments it has been shown that we can gain t te 
•reat advantage of smaller blocks, or increased 
Lomogeneity of material, for all the primary com- 
jarisons, and their interactions by pairs, at the 
'xpense of some sacrifice of information about 
; he tfiple interactions, which. are presumed to be 
mmparatively unimportant. The advantage o sue 1 
a procedure would be great in many practical cases, 
even if all knowledge of the interactions of a higher 
order had to be foregone. Formally, however, the 
typical experiment discussed has shown that the 
sacrifice required is that of one only of four portions 
into which the triple interactions may be divide , 
and that we may sacrifice whichever one we please 
of these four portions. If, now, it is thought that 
knowledge of these interactions, though adm.tUt y 
comparatively unimportant, is not wholly worthless 
the fact that only one-quarter has to be sacrificed w> 
appear to be a real advantage. This advantage is 
not however, made fully accessible by the experiment 
proposed ; for the 6 degrees of freedom conserved, 

• while they afford satisfactory guidance as to the 
significance or insignificance of such triple inter- 
actions as may exist, represent manorial contrasts of 
a somewhat complex kind, and are not in fact the 
components we should choose for separate examina- 
tion if the triple interactions had been conserved in 

their entirety. 
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When the quantity of an ingredient of a mixture 
has been tested at three different levels the two 
independent comparisons which these P rovl ® ™ 
often be usefully subdivided m a part.cu V- 

We mav regard the difference between the highest 
^e. and the lowest as representing the principal 
effect of the ingredient, that is, as giving 
average effect brought about by a unit addition J { 
this ingredient, averaged over the range of dosage 
studied in conjunction with this principal degree of 
freedom we may introduce a second, orthogona to 
or statistically independent of, the firs,. Th* will be 
found by subtracting the sum of the e 
first and third level from twice the effect of the secon 

level of concentration. Thus if (*,), («.). M stand 
for different levels of the amount of nitrogen m 
manorial mixture, or for the measured <f*^ UC 
treatments, the two components of the e 
nitrogen which may conveniently be separated are 

defined by . , .. 

Nj = (»3)-(«i) 

an ^ N 2 == 2 (« 2 )— («i) — (”3)- 

These forms at least will be convenient when the 
concentrations tested differ by equal steps, or by 
steps which, on any hypothesis under considers. . ■ 

should produce equal effects. They may be modified 
to other orthogonal linear forms when the ™ latldn j"P 
between the quantities used experimentally 
mTe complicated character. Here we are concerned 
only to illustrate the statement that when high order 
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interactions are regarded as having any experimental 
importance, our interest will usually be centred on 
particular components into which such interactions 
may be analysed. The statistical independence ot 
the two forms proposed above may be conveniently 
verified by multiplying together the coefficients o 
(« t ) in the two expressions, and adding the product 
so formed to the corresponding products for t te 
coefficients of (« 2 ) and («,). If these products 
add up to zero the components designated are 
statistically independent and represent mutually 

exclusive degrees of freedom. 

Considering the 4 degrees of freedom for the 
interaction of two ingredients, such as nitrogen and 
potash, it is now readily seen that these can he 
denoted by the four symbols, NjKj, N 2 K 1; NiS 
and N 2 K 2 , any one of which may be mterprete 
in terms of the treatment concerned by algebiaic 
expansion. Thus : 

NiKj = 

= (n 3 k 3 ) — (n^’3) ~~ (”a^i) ' C^» X T). 

and so with other expressions. It will be seen at once 
that, if our interest in the interaction between nitro- 
genous and potassic treatments arises principally from 
a suspicion that, with a larger supply of nitrogen, 
there may be a greater need for or opportunities 
for the utilisation of potash, then the P‘‘ rt ' cul " 
component N,K, will have an interest wh.ch the 
other components from which it has been separated 
do not share. Similarly, with triple interactions it 
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might well be that the sole scientific interest of the 
eight independent comparisons which, in our experi- 
ment, these afford, lay in one particular component 

such as NiKiPi- 

The inconvenience of the confoun 1 S P 
used in the experiment consists, therefore, in the fact 
that if the triple interaction, ?r any component of t, 
is possibly of sufficient magnitude to be not whol y 
negligible! the components of triple interact, on 
conserved by the experiment will not proba y 
themselves of any special interest; and ‘ n . 
absence of the two components which have bee 
confounded, will not afford the means of iso tong 
the more interesting components for special study^ 

It would seem, therefore, that it would have been 
preferable, if possible, to have spread such informa- 
don as the experiment is designed ,0 give respecting 
the triple interactions, equally over the 8 de * rees ° 
freedom of which they are composed, unless t 
structure of the experiment is itself such as to isola e 
for conservation just those components which are 
the greatest interest. The process of spreading 
available information equally over the whole group 
of comparisons which are affected is known as partial 

confounding. ir f u p 

In our example there are 12 replications. If the 

number of replications, as in this case, is divisible y 
4 then, instead of completely confounding a chosen 
pair of degrees of freedom out of the four pairs aval _ 
able we might partially confound all four, by using 
i! h cyclic set three times instead of using the same set 
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twelve times. The treatment comparison represented 
bv any cyclic set will then have been conserved m 
9 replications, while ,t has been sacrificed in 3f 
All these comparisons will therefore be capable < 
evaluation from the results of the experiment though 
With only three-quarters of the precis, on with wdne 
interactions between pairs of factors, and the effects 
„f single factors have been evaluated. In such a 
arrangement the general advantage conferred by the 
principle of confounding may be most clearly seen 
for the reduction in the size of the block from 27 P 0 
,0 9 will probably have increased the precision of he 
unconfounded comparisons in a higher ratio than thru 
of a ■ 3 ; and, as the triple interactions are only 
confounded in . replication out of 4, they alsojt 
be evaluated with increased, but more 
increased, precision, in spite of the quarter of the 
information respecting them which has been sacr, icet 
in-order that the blocks might be reduced to 9 p! 

^ The information concerning the triple interactions 
which has been supplied by the experiment, f pa^d 

C thosl parts of the experiment from wnich they 

are available, without using those parts m wh, 
c Bv this means the com 

, hey ate ^ the ]arger e|ements 

parison is kep distinguish the different 

of soil heterogeneity, which disti g 
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blocks. Thus the contrast 
treatments 



between the three sets of 





may be obtained by adding up the aggregate response 
to these sets of treatments, in the 9 replications in 
which these contrasts of manurial treatment are not 
those which characterise whole blocks. A different 
set of 9 replications will provide the material for 
evaluating the contrasts between the sets of treatments 
of the second group, namely, 



3 

2 

1 





2 

1 

3 



while different sets of nine replications will give the 
contrasts between the cyclic derivatives obtained from 
the squares 

123 1 3 2 

312 and 2 1 3 

231 3 2 1 


If, now, we are specially interested to evaluate a 
particular component of the triple interaction, such 
as that which has been denoted above by NxKxPx, 
we must obtain this by a combination of the contrasts 
which the experiment provides directly. To do this 
may require some ingenuity. The solution in this 
case is found by using only those squares in which 
one diagonal, or the other, contains plots with the 
highest or lowest level of phosphate. ^ Thus, if we 
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compound with the proper positive and negative signs 
the yields given by the experiment for the 8 squares 
set out below, it appears that 





1 2 3 \ /3 2 1 

23 1 ) + I 2 1 3 
312/ ' 1 3 2 

/ 1 * 2 3\ /3 2 1 

(3 1 2 ) i I 1 3 2 

\ 2 3 1 ' X 2 1 3 



is equal to 3 NxKiPi- 

With the aid of this example the reader will do 
well to consider how the data of the experiment 
should be combined to obtain other types of inter- 
action, such as those denoted by N^l N S 1 , 1. 
and N 2 K 2 P 2 , and to satisfy himself that these ca 
each be derived by a similar choice of appropnat 
compounds from the data provided by the part.a y 
confounded experiment. 
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SPECIAL CASES OF PARTIAL CONFOUNDING 

48. Treating of a subject such as experimental design 
in general it is possible to give adequate space only to 
general principles leading to .the more advantageous 
procedures which are available. These it is essentia 
to grasp. Their applications to particular details tha 
arise in practice are of endless variety and afford 
scope for a great deal of ingenuity. These required 
be studied in detail by workers in different fields o 
experimentation in order to reap the full advantages 
which a clear grasp of general principles makes 
possible. It may be of use in this chapter if we 
consider some of the more special applications of t e 
principle of partial confounding which were found to 
arise in their early application to field trials in 

agriculture. . 

& 49. Dummy Comparisons 

It may happen that, in order that the different 
variants of each factor may occur with proportion 
frequency in combination with the variants of ot er 
factors, certain of the combinations used are actually 
indistinguishable. For example, in an experiment 
with four different nitrogenous manures we may a so 
wish to vary the quantities used. We may wish to 
compare plots receiving no nitrogenous manure w,t 
others receiving a single or a double dressing These 
single and double dressings will be applied to different 
plots, in each of the four nitrogenous materials to be 

138 
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tested, and the precision of the comparison between 
single and double applications will be enhanced by 
the fact that each is represented on four kinds o 
plots In order that the comparison with the plots 
receiving no nitrogenous manure may be of equal 
precision, it is necessary that these shall be as numerous 
as those receiving single or double dressings, and 
therefore four times as numerous as any one kin 
of these. To compare the efficacy of the four kin s 
or qualities of nitrogen simultaneously with the three 
quantities (o, >, 2 ) with which they can be combined 
we might make blocks of .2 plots each, in which 
the plots receiving single or double dressings will 
be manured differently, while the 4 P lots recelvl E 
none will all be manured alike. The comparisons 
among these within each block will be ascnbable 
solely to experimental error, including in that term, 
as is usual, variations in the fertility of different plots 
in -the same block. Thus, if there were 5 mphca- 
tions, the analysis of the 59 independent comparisons 
among the 60 plots would not be 


TABLE 18 


Blocks 
Treatments 
Error 


Total 


but 


TABLE 19 


Blocks 

Treatments 

f between blocks 
ErT0r \within blocks . 
Total 


Degrees of 
Freedom. 

4 
1 1 
44 
59 

Degrees of 
Freedom. 

4 

8 

• 32 

• \L 

59 
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Here we have divided the 47 degrees of freedom 
available for the estimation of error into two parts, 
to show that 15 degrees of freedom come from a 
comparison of identical plots in the same block 3 
from each of the five blocks, while 32 come from the 
comparison of the differences among the 9 different 
treatments in the five blocks in which they are toed. 

As between the two factors of quantity of nitrogen 
N, and quality Q, th e> 8 degrees of freedom between 
the 9 treatments will be allotted as follows. There 
will be 2 for the comparison of the three leve s 
of the nitrogen, and 3 for the four qualitatively 
different mixtures in which it is applied, leaving 3 
more for interactions between N and Q. In ot er 
words we have, as we would have if the four manures 
were applied only in single and double doses, 3 degrees 
for quality and 3 for interaction. The addition of t e 
plots without the nitrogenous manure has left these 
two classes unaffected, but has added 1 to the degrees 
of freedom for quantity of nitrogen. 


50. Interaction of Quantity and Quality 

In this connection a modification is to be indicated 
in the manner in which the effects of qualtty and 
interaction are to be reckoned. If we were to consi er 
N and Q as two independent factors, the 3 degrees oi 
freedom for interaction would be obtained simp y y 
comparison of the four quantities by which the double 
applications of each manure exceed the effects of 
single applications of the same materials. Equally, 
the simple qualitative effects would be represented by 
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the contrast between the four totals of single and 
double dressings of these four mater, als. Such a 
subdiviston is seen to be not wholly ^factory 
when we consider that the quant, tat, ve contrasts are 
differences caused by quantitat.ve vanat.ons in t ^ 
very substances which the qualitat.ve comparisons 
Ire intended to compare. Thus, if a quant, ty of 
nitrogen applied as cyanamide differs m tts effect on 
the clop from an equal quantity of mtrogen applied . 
urea, it is to be anticipated that with larger quant,,, e 
of the manurial applications the difference won > 
enhanced. In fact, the hypothesis that the different . 
are proportional to the quantities of nitrogen apphe< 
"In many ways a sunpler one, in the sense of ho, ng 
more natural and acceptable, than that the difference 
should be the same irrespective of the quant,,, e, of 

material added to the soil. 

If we take this view, results in which the double 

dressings of two ingredients differ by tw.ee as much 
as the single dressings, but in the same direction would 
be regarded as exhibiting pure effects of qua ,ty Q, 
without interaction NQ. The interact, ons must 
therefore, be identified with the three independent 
comparisons among the four quantities winch would 
be obtained by subtracting the yield of t e 
dressing from twice the yield of the corresponding 
single dressing. For these ^ four "°“ c are 

— absem. Fquaff. 
the primary effects of quality will now be reckoned 
by comparing the four sums found by adding twic 
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the yield of the double application to the yield from 
the single application of the same manure; as in 
calculating the “ regression ” of the manurial response 
upon the manurial difference to which it is for t e 
present purpose to be considered as proportional. 
The statistical principles and methods in the treat- 
ment of regression are developed in Statistical Methods 
for Research Workers. 

That these two methods give different subdivisions 
of the same total follows from the algebraic identity 

K x +y) 2 +K x -y) 2 = 

If x ,y stand for the yields of the double and single 
dressings of any manurial material, the two terms on 
the left represent the squares assigned to Q and NQ, 
using the convention that “interaction” means 
variation in the values x—y, while the two terms on 
the right represent the squares assigned to Q and NQ 
on the convention thatQ interaction ” means variation 
among the values *-2 y.) The same method of 
subdivision with appropriate coefficients is evidently 
applicable whatever may be the ratio between the 
quantities used. Note that the divisor of each square 
is the sum of the squares of the coefficients, while the 
sum of the products of the coefficients in any two 
squares of the same set is zero. 

51. Resolution of Three Comparisons among 
Four Materials 

The 3 degrees of freedom in Q or in NQ are the^ 
three independent comparisons among four different 
materials, such as sulphate of ammonia (s), chloride 
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of ammonia (m), cyanamide (c), and urea («). These 
may be systematically subdivided, if it is thought 
convenient to do so, as the three possible comparisons 
between opposing pairs of materials. There are in 
fact just three ways of dividing four objects into two 
sets of two each ; these are : — 

s-^m—c — u 
s — m-\-c — u 
s — m — c-\-u, 

and these are all mutually independent, as may be 
verified by observing that the sum of the products of 
the coefficients (+1, or - 1) of the symbols in any 
two of these three expressions is zero. 

Regarded combinatorially this is equivalent to the 
statement that a 2 x 2 Latin square is possible, namely, 

A B 
B A, 

for. in such a square the three objects are divided into 
pairs in three ways, as rows, as columns, and as 
letters, and the specification of a Latin square requires 
that these shall all be mutually independent. 

52. An Early Example 

An experiment with sulphate of ammonia, chloride 
of ammonia, cyanamide and urea, in quantities 
o, 1, 2, and with and without superphosphate 

was carried out in barley at Rothamsted, in 1927 - 
Two replications, or 48 plots, were used. These 
were divided into four blocks of 12 plots each. 
In two blocks phosphate (p) was applied with 
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chloride of ammonia and with urea, in both single 
and double dressings, while in the other two blocks 
it was applied with sulphate of ammonia and. with 
cyanamide.' Each block contained two plots without 
nitrogenous or phosphatic dressings, and two plots 
with phosphatic only. The plots were assigned to 
treatments at random within the blocks. 

Among the 18 different treatments there will be 
17 independent comparisons. One of these, however, 
namely, 

(/> — i)(r x +r 2 — ;»i— m 2 +^ i + £ z ~‘ u i ~ “2) 

has been confounded with blocks. There are 16 
degrees of freedom for treatments in the analysis, 
and 3 for blocks, leaving 28 for error. It would, 
however, be a mistake to assume from this that these 
28 are all pure error, for it will appear that owing to 
the occurrence of dummy treatments, or more properly 
of plots treated alike in different blocks of the sa'me 
replication, the degree of freedom destined to be 
sacrificed has in fact only been partially confounded. 

It is instructive in such cases to consider exactly 
what comparisons will consist solely of error, un- 
affected by any treatment differences. Within each 
block there are two unmanured plots the difference- 
between which is pure error, and two phosphatic 
plots of which the same is true. Here, therefore, we 
have 8 degrees of freedom contributing only to our 
estimate of error. To make sure that these are not 
counted a second time, it is sufficient that all further 
comparisons to be made, if they involve these plots, 
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shall involve only the pairs of plots treated alike 
taken together. Next, observe that there are two 
pairs of blocks with the same treatments, 10 in each. 
The 10 differences between the performances of these 
in the two blocks of a pair, will be distributed 
about a mean representing the difference in fertility 



C D 

Fig. 2. — Arrangement of treatments anil yields of grain in experiment 
on quantity and quality of nitrogenous fertilisers in barley 1927. 


.between the two blocks ; but the 9 degrees of freedom 
of their variation about this mean will be pure error. 
.There are thus 9 degrees of freedom from each pair 
of blocks, or 18 together, which, with the 8 from within 
blocks, make 26 in all. In subsequent comparisons 
we must, however, treat the two blocks of each pair 
together. Finally, the two pairs of blocks have two 
t?eatments in common, those unmanured and those 

having phosphate only. The differences between these 

K 
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two treatments in the two pairs of blocks will not be 
necessarily the same, and the discrepancy between 
them will be pure error ; this last degree of freedom 
makes up the total to 27. 

The yields of grain from each plot in units of 
2 oz. are shown in Fig. 2 ; the contributions to the 
sum of squares ascribable to these 27 degrees of 
freedom of pure error are shown in Table 20. 

Squares involving two plots only are divided by 
2, others such as the first two entries in the second 
and third columns depend on 4 plots, and are divided 
by 4, and the squares of differences between pairs of 
blocks by 24. Finally, the discrepancy of 47 units 
between (/>) — ( 1) from the two pairs of blocks 
depends on 16 plots, and has its square divided by 
16. The several ingredients are thus brought to a 
comparable basis. 

It was mentioned above that the single manurial 
contrast 

(/>— i)(r 1 +r z — wq — W2+' r i+r 2 — «!— w 2 ) 

in which the blocks differ, had not been totally con- 
founded, meaning by this that it could be indirectly 
estimated by comparisons within blocks. The com- 
parison within blocks, independent of all those used 
in the estimation of error, which depends only on 
this one manurial contrast, arises from the fact that 
though the treatments concerned are not to be found 
in the same block, yet the different blocks in which 
they appear also contain some plots treated alike, 
with which each group can be compared. In each 
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TABLE 20 


Analysis of Components of hrror 


jSquares ot Differences 
! between like Plots in 
i the same Block. 


(P) ■ 

<0 

! (P) 
i (1) 
i (P) 
(1) 
(» 
(1) 


841 

5476 

2209 

84 > 
4489 

1089 

256 

64 

15265 


Squares of Differences between like 1 re.itments 
in like Blocks. 


5‘> 
- 7.18 
.hff'4 
289 
1 2544 
4 

529 

2304 

5184 

400 

45 <v 3 

26i)49-(> 


of Blocks. 


(P) 

(0 


; (p) 
(1) 


40 

104 

144 

97 

47 


(p) ■ 

7812-5 

(1) 

(0 

3200 

(P) 

(«1 p) ■ 

2809 

(i/,) 

(6) • 

1936. 

Ki p) 

(6) 

17424 

(s,p) 

( »‘\P ) ■ 

484 


(u 2 p) • 

1024 

+6) 

(**) 

529 

(f iP ) 

(s 2 ) ■ 

1 1449 

(Sip) 

(»‘ 2 P) 

1444 

("'ll 

Total . 

- 9075 

39036-5 

Total 

in Pairs 

Sutum.in <>1 ( 1 



urr 1 m 


I K-^ret's ol 

— 

A + D 

1* rtHtloin. 


151 

8 


+ 54 

9 


— 

9 


97 

1 47 ’ 

l6 

B + C 

27 



Sum of Squares. 

7632-5 

1951825 

I3474-8.3 

138-0925 

40763 64583 
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block in fact we may compare the plots having single 
and double dressings of nitrogen with twice the sum 
of the plots having none. Thus two blocks give 


+ (P s l) + Oi) + O2) + ! (M) +' O2) 

+ (u 1 ) + (u 2 )-2(j>)-2(j)-2(l)-2(l) - 1483 - 

B 


1157 ; 

c 


while the other two give 

Cb ) + 0 2) +(/>"* 1) - K/ "*2) + Oi) + (*2) + (/ ,w i) 

^ +(i>« a )-2(^)-2(^)-2(i)-2(i) = 1015, 1480; 

. A D . 

whence we obtain by subtraction 

(p — i)(r 1 +r 2 — w*i— u \~~ u i) ~ T 45- 


The value for each block is a combination of 8 plots 
with coefficient +i, and 4 plots with coefficient 
— 2, so the sum of the squares is 24, or for the 
four blocks, 96. The contribution to the sum of 
squares of this partially confounded manurial com- 
parison is therefore 145 2 A 9 6 . or 219-0104. To the 
divisor, 96, the plots having the treatments to- be 
compared contribute only 32, so that the comparison 
is made with only one-third of the precision of the 
16 unconfounded comparisons. 

The other elements in the analysis may now be 
evaluated. The 3 degrees of freedom between blocks 
are easily found to account for a contribution of 
12,215-75 to the sum of squares. The total effect of 
treatments could now be obtained by subtraction of 
the three items already evaluated from the total ; the 
interest of the experiment, however, lies in evaluating 
the separate factors of the treatment differences. 
The total yields, contributed by 8 plots each, in the 
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six classes of treatment formed by combining two 
levels of phosphate with three of nitrogen, are shown 
in Table 21. 

TABLE 21 



No 

Nitrogen. 

Single 

Nitrogen. 

Double 

Nitrogen. 

Total. 

With phosphate 

Without phosphate . 

.2237 

1096 

3280 

294<> 

3«7<> 

3499 

9.393 

8,441 

Sum . . 

Difference 

4233 

241 

(>’2<> 

* 334 

7375 

377 

>7.8.14 

952 


Plots receiving phosphate have exceeded those 
not receiving phosphate in all by 952 units, so that 
the 1 degree of freedom, P, contributes 952*4-48. 
or 18,881-3. 

We may next take the 2 degrees of freedom for 
quantity of nitrogen N, and 2 more for interaction 
with phosphate NP. 

•The 2 degrees of freedom for N are evidently 
found by comparing the sums 4233, 6226, 7375 ; 
clearly the principal effect, the contrast between 
double nitrogen and none is the important part ; the 
difference 3142 from 32 plots contributes the large 
item 3142* 4-32, or 308,505-125 for IV The remain- 
ing degree of freedom, corresponding to diminishing 
return for the second dose of nitrogen, is found by 
subtracting the first and last totals from twice the 
total for single nitrogen, squaring, and dividing by 
96. This gives 844* 4-96, or 7470-16 for N 2 , a much 
Smaller, but still a significant, value. We may treat 
the differences in the same way. For NjP we have 
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_l_ 32, or 578, and for N 2 P 50 2 -r- 96, or 26-0416, 
both quite insignificant contributions, though in both 
cases of the expected sign. The items evaluated so 


far are : — 


N* 

N, 

P 

NP 


TABLE 22 

Degrees of Sum of 

Freedom. Squares. 

I 308505-125 

I 7420-167 

1 18881-333 

2 604-042 


We may now consider the qualitative differences 
Q, and their interaction with quantity NQ. The 
totals 'from 4 plots each for the single and double 
application of the four nitrogenous nutrients are 
shown in Table 23. 

TABLE 23 


Quantity. 

S. l 

Mate 

rn. 

:rial. 

c. 

u. 

Differences between Pairs. 

s-\-m m \-u rn-V.c 

-p+tt) - 6 +q -(•?+“) 

(1) 

1524 

1618 

1615 

1469 



1 

(2) 

169I 

2110 

1607 

1965 


1498 

358 | 

2(2) -f-( 1 ) 

49IO 

5838 

4829 

5399 

520 

2(lj-(2) 

1355 

1 126 

1623 

973 

-II5 

-879 

421 j 


The 3 degrees of freedom for Q and for NQ can 
now be found either by taking the sums of the squares 
of the deviations from their means, of the last two 
lines and dividing by 20, or by splitting the columns 
in the three possible ways into two opposed pairs, 
and dividing by 80. The latter process gives 
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TABLE 24 



Squares of 
Differences. 

Squares of 
Differences. 


Q 

NQ 

s-\-tn — c — u 

270400 

13225 

m+u—s—c 

2244004 

772641 

m+c—s—u 

128164 

1/7241 


2642568 

<>65107 

Q . • 33032 -I 

NQ . 

12038-8375 


being .each for 3 degrees of freedom. 'I he separate 
evaluation of these three comparisons as above brings 
to light the somewhat suspicious circumstance that 
the largest contribution in each class is from the 
particular contrast between nitrogenous materials 
which has been used (in its interaction with phosphate) 
for confounding. If it is a coincidence that the two 
pairs of nutrients most contrasted in their effects on 
yield, and in their interaction with quantity of nitrogen 
have been chosen for the purpose, then the choice has 
been an unfortunate one. If not then we may suspect 
that the conditions in the different blocks of land used 
have, in some obscure way, influenced the apparent 

reaction to these nutrients. 

Had we adopted the subdivision between Q and 
NQ by means of the sums (2) + (i) and differences 
( 2 )-(i), we should have 21,739-094 for Q and 
' 23,331 -844 for NQ, making the same total, but giving 
a larger contribution for interaction than for the 
prime factors of quality. The subdivision employed 
above is therefore preferable, as based on a view 
df qualitative differences more in keeping with the 
facts. 
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The remaining interactions of phosphate with the 
quality of the nitrogenous application QP, and of 
phosphate with quality and quantity of nitrogen 
NQP, may be evaluated in a manner similar to Q 
and NQ, using the differences in place of the sums 
of the plots which have and have not received phos- 
phate. In this group, however, we must remember 
that a particular component involving the contrast 

(s — ?n-\-c — u ) 

has been confounded with blocks. The differences in 
yield between plots receiving phosphate and those 
receiving none are shown below : 


TABLE 25 



Material. 



Differences between Pairs, j 

Quantity. 

s. 

m. 

c. 

u. 

m + s m\-u s-\-u 

— c — u — s — c —c — m 

(0 

(2) 

2 (2)+(!) 

2 ( 1 ) — ( 2 ) 

136 

307 

750 

-35 

150 

—18 

1 14 

.318 

(>7 
691 
205 ' 
65 

-19 

19 

19 

-57 

... . . r 

640 (822) 45 ° 

275 (-230 -475 


The two unconfounded comparisons in the group QP 
make therefore a contribution evaluated by summing 
the squares of 640 and 450 and dividing by 80. This 
gives 7651-25 for these 2 degrees of freedom. The 
two corresponding components of NQP is the sum of 
the squares of 275 and 475 divided by 80, or 3765-625. 
The manurial comparison which has been confounded, 

namely, , N 

(p— OC-Si+ra— »*s+ 6 + 6 “ u i .“*) 

is not precisely a component either of QP or of NQP, 
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as we have defined these groups. It would be a 
component of QP on the alternative definition dis- 
cussed above, and the remaining unconfounded 
portion is the component of NQP of that definition, 
namely, 

(/> — — W2 + w l + 0~‘'l~ , '2 + w l)- 

This gives 303 2 -r 32, 0^2869-031. 

The complete analysis of the variations observed 
among the yields of the 48 plots may therefore be set 
out as in the table below. 


TABLE 26 


Blocks 
Nt • 

P 

Q • 

N, • 

NQ . 

QP ■ 

NP . 

NQP 

NQP) unconfounded 
QP ( confounded 
Error 


De^reei of 

Sum of j 

Mean 

I I.»K# 

freedom. 

Squares. 

Square 


3 

12215-750 


2-8659 


308505-125 

308505-125 

1 

18881-333 

18881 -333 

1 -469! 

3 

33032- 100 

1 1010-700 

1-1995 

1 

7420-167 

7420-167 

I -002 1 

3 

1 2038-838 

4012-946 

0-6948 

2 

7051-250 

3825-625 

0-6709 


604-042 

30202 1 


-> 

3765-625 

1881-812 


I 

2860-03 1 

2869-031 


I 

219010 

219010 


27 

40763-646 

1509-765 

0-2060 

47 

447965-917 




The total sum of squares for the 47 degrees of 
freedom, which have above been evaluated individu- 
ally, must check with the sum of the squares of the 
deviations of the yields from the 48 plots from their 
mean without regard to the manurial treatments they 
"have received, or to their topographical arrangement. 
This affords a cWeck both on the arithmetic and on 
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the logic of our procedure, at least so far as to show 
that it has consisted in a subdivision or partition of 
the different components of variation actually present. 

Next, it may be noticed that the confounding 
employed has involved a component of treatment 
recognisable as an interaction of P with one of the 
quality comparisons Q, but not identifiable with 
either of the particular aspects which we have 
thought it proper to recognise respectively as QP 
and NQP. It thus resembles the components con- 
founded in the experiment with 27 treatments, 
discussed in Chapter VII., and, as in that case, would 
be a source of inconvenience, if the unconfounded 
component observed were one of any importance. 
The table shows, however, that in the present case 
the component in question is of no practical interest. 

53. Interpretation of Results 

The treatment comparisons in the table have be'en 
arranged to show first those which have had a 
significant influence on the yield of grain, next those 
in which there may perhaps be an indication of real 
influence, but of a magnitude which could only be 
demonstrated by more precise experimentation, and 
finally those which in the present experiment appear 
to have exerted no appreciable effect. By far the 
largest contribution is made by what we have called 
the principal effect of nitrogen, this 1 degree of 
freedom containing indeed more than two-thirds of 
the total. The mean square is over 200 times the 
mean square for error, or, since V 200 exceeds 14, we 
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may see at once that the general effect of nitrogen 
in this experiment is over 14 times its standard error 
and is therefore determined with comparatively high 
precision. The single degree of freedom for phosphate 
has a mean square over twelve times the average, 
showing that this effect also is certainly significant, 
though the quantitative value of this ingredient las 

been evaluated only roughly. 

The statistical significance of each contribution to 
the total is most easily determined from the last 
column, which shows the half values of *0 "uturn 
logarithms of the mean squares. 1 he table - 
(Statistical Methods, Table VI.) shows that, wit 
„ degrees of freedom for error, the amount by winch 
this entry may exceed that for error, at a 5 per cent, 
level of significance, is 

•7187 for 1 degree of freedom 
•6051 „ 2 

and ' 54 2 7 >» 3 •> ” 

The corresponding values for significance at the 

, per cent, level are 1019., 0 85. 3 and o^i- ‘he 
vahte for Q is therefore significant on the higher 
I, anLd (. per cent.) and that for N a, thrower 
standard (5 per cent.). We may < h erefo ‘ake the 
values of Table 23 to indicate that chlorid 
ammonia was really more successful than sulphate o 
ammonia or cyanamide in stimulating grain P ro( 
non with urea in an intermediate position, and that 
the second nitrogenous application was in general less 
fruitful than the first. 
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The mean squares for NQ and for QP, though 
considerably larger than that for error, do not reach 
the 5 per cent, level of significance. It therefore 
appears that the suggestion of the figures of Table 23 
that chloride of ammonia and urea are not only more 
successful than sulphate of ammonia and cyanamide, 
but are disproportionately so in the double application, 
though supported by the data, is not demonstrated ; 
and that the suggestion of Table 25 that, when 
sulphate of ammonia is used superphosphate is more 
effective than with the other nitrogenous fertilisers, 
must also be regarded as doubtful. The remaining 
6 degrees of freedom, ascribable to manurial treatment, 
are clearly insignificant to such an extent that it 
would have made no appreciable difference if their 
effects had been included with those of pure error. 
This circumstance shows, that the principle used in 
the choice of a component for confounding was in 
fact justified by the result. Their separate evaluation 
serves to show how this can be done, and supplies the 
safeguard that our positive conclusions are based on 
an estimate of error uncontaminated by possible 
interactions among the treatments. 

This example illustrates the fact that when 
quantitative and qualitative factors are combined in 
the same experiment, the special meaning of their 
interactions may . well be taken into account in 
experimental design. Especially, when the quantities 
involved include zero, some of the treatment com- 
parisons vanish, leading, on the one hand, to an 
increase of the comparisons available for error, and 
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9 BLOCKS OF 9 PLOTS 

sometimes also to the partial recovery of comparisons 
which would otherwise have been totally confounded. 
The reader should consider the effects of one simple 
modification of the design used, by supposing that the 

component 

^,_ I )( Jl +* 2 -7« 1 ->« 2 +<T + c.,- «!-«•>) 
were confounded with one pair of blocks, and the 
component 

j,-;»»+w 4 +'i - < V-«i ! 

with the other. 


54 . An Experiment with 81 Plots 

In considering the experiment with a 7 treatments 
in Chanter VII. it was shown that these coulc >t 
arranged in blocks of 9 at the expense of confounding 
2 of fhe degrees of freedom representing triple intc 
actions It was also shown that when replication can 
£ carried out in multiples of 4, the confounding 
couid be spread equally over the whole of these 
8 degrees of freedom, so that all triple tn.eracttons 
could be recovered with some relative loss of proc.ston 
though possibly an absolute increase. When quant., 
ative and qualitative factors are comb, net! m the 
same experiment there is little point in res.net, ng 

the effects of confounding to the triple 
defined for the purpose of that example. Moreover, 
it is often necessary to see what can be done ,n 
experiments of less than .08 plots. The follow, „g 
•design, which was carried on, ,n potatoes a, 

Rothamsted in . 93 '. a method ° f U “ 1,S * 
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8 1 plots so as to gain the principal advantages which 
the experiment was intended to secure. 

The factors to be tested were three levels in the 
ratio o, i, 2 of nitrogenous manure in combination 
with three similar levels of potassic manures. The 
potassium was to be supplied in three qualitatively 
different materials, namely, potassium sulphate (s), 
potassium chloride (? n ), and a material known as 
potash manure salts (/>), consisting of potassium 
chloride with a large admixture of common salt. 
The plots were divided in 9 blocks of 9 plots each, 
each block containing one plot with every possible 
combination of the three levels of nitrogen with the 
three levels of potash. The 3 plots without potash 
therefore received respectively o, 1 and 2 doses of 
nitrogen. The same was true of the 3 plots receiving 
a single potassic dressing, and of the 3 plots 
receiving double potassic dressings ; but in the 
case of these we have to choose in which form the 
potassium shall be supplied. In fact, at each level 
of potash one plot received sulphate, one chloride, 
and the third potash manure salts. The only ways in 
which the blocks can differ consist in the manner in 
which the three kinds of potash in each level are 
assigned to plots receiving o, 1 or 2 quantities of 
nitrogen. 

Considering only plots receiving single potassic 
dressings we may designate those which receive 
sulphate of potash with o, 1 and 2 quantities of 
nitrogen by .r 0 , an< ^ Then the set of plots at this* 
level within any block will have some such formula 
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as v*i^> « r - if we make the conv T io ? that d ; e 

suffices are to be taken in their natural order, simply 
bv smp. If, now, corresponding to the block or 
blocks represented by the formula snip there are 
equal numbers of blocks represented by the formu te 
mps and p s m, it is clear that the 9 kinds of plots 
which receive single potassic dressings will occur m 
the experiment in equal numbers; and, in fact, that 
we may assign 3 of our blocks to each of these 
formulae. We might equally have used the formula, 
s pm, P»is and m s p, but our choice is limited to 
oi Of these two cyclic sets. The same ,s true o the 
specification of the blocks in respect of the plots 
them which receive a double dressing of poms - 

particular design we shall consider is that forme, y 
choosing one of these cyclic sets for single potash 
making a similar choice for the double potash, ant 
finally deciding that each of the 3 blocks which 
have the same formula at one level of potash sha 
have three different formula, at the other, so that 
the 9 blocks are all assigned to different sits o 
treatments. They are all alike, in sets of three, at one 
level of potash, and in different sets of three at 
other level, like the rows and columns of a 3 X 3 squar . 

As in the previous example, let us now consider 
which comparisons are available for estimate jrf 
error, and which remain for estimations of the effects 
of treatments. Since the three treatments without 
potash are the same in every block, the comparisons 
among this group will at once yield 16 e 2 re ^ ° 
freedom. At the level of single potash the three 



i6o CASES OF PARTIAL CONFOUNDING 


treatments are the same in sets of three blocks each, 
so that each set yields 4 degrees of freedom for error, 
or 12 in all. A second group of 12 is provided by 
the level of double potash, bringing the total for 
comparisons made between plots at the same level 
of potash up to 40. We must now confine ourselves 
to comparisons in which plots with the same potassic- 
dressing in the same block are treated together. 
Comparing the plots receiving single potash with 
those receiving none in the same block, we see that 
this comparison is the same in three sets of three 
blocks each, giving 6 degrees of freedom, while six 
more are obtained by comparing the double potash 
plots with those without potash in the same block.* 
There are thus 52 degrees of freedom ascribable 
solely to experimental error. Together with 8 for 
comparisons between whole blocks, and 20 for 
comparisons among the 21 different treatments, we 
have enumerated the whole of the 80 degrees of 
freedom in the experiment. There could be no more 
contributions to pure error, unless some one or more 
of the treatment comparisons had been totally con- 
founded with block differences. 

We may now consider the manurial comparisons. 
There are seven combinations of quantity and.quality 
of potash, the six comparisons among which may be 

* The last two sets of six components each are not, however, wholly 
independent, since the plots without potash are used in both. The sum 
of squares for all twelve is most simply obtained by deducting from the 
26 degrees of freedom among the totals from each block for the o, i, 
and 2 levels of potash, the 2 degrees of freedom for K, the 8 for blocks 
and the 4 partially confounded effects of treatment which will be 
identified later. 
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resolved into 2 for quantity K, 2 for quality Q, 
and the remaining 2 for interaction of quantity anc 
quality KQ. The distinction between Q and Ky 
will be made by the same convention as in the last 
example. Variation of the quantity of nitrogen 
increases the number of manorial combinations to 
21 and therefore introduces 14 new comparisons, 
these, 2 represent the effects of quantity of nitrogen 
only N, 4 the interactions of quantitative variations 
of nitrogen and potash NK, 4 the interactions o 
quantity of nitrogen with quality of potash NQ, and 
i more the triple interaction N KQ. All these groups 
of comparisons, except those denoted NQ and N KQ 
are obviously free from confounding, or they can be 
made up directly by comparisons withm blocks. It 
is only the last 8 degrees of freedom winch require 
special consideration. As often happens, and as the 
previous examples have already illustrated, we shall 
Lst see what has happened to thts group of 
comparisons by resolving them into components in 
a way specially appropriate to the structur 

eXP Si'n« n qualitative differences exist only in plots 
receiving either single or double potassic dressings, 
the eight comparisons, with wh.ch we are concerne , 
are equivalent to the four representing interactions of 
nhrogen with quality of potash on the p ots receiving 
single potassic dressings, together with the simtla 
four on the plots receiving double potassic dressing 
Let us consider these two parts separately. Just as 
the three independent comparisons among the four 
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nitrogenous materials of the last example were sub- 
divided in the same manner as the contrasts between 
rows, columns and letters in a 2 X 2 Latin square, so 
we shall now use the analogous property which a 
3x3 square possesses. Let the rows of such a square 
correspond to the quantities o, 1 and 2 of nitrogen 
and the columns to the three sorts, s, m and p of 
potash. Then three of our blocks have, in respect 
to the single potash dressings, the formula, s p m. 
These we may call Blocks of type A and insert the 
letter A in the three corresponding cells of the square. 
There will also be blocks with the formula pms , 
which we may call type B, and with the formula 
m s p which we may call type C. If these letters be 
inserted we shall have 33x3 Latin square as shown 
below : — 

Kind of Potash, 
j. m. p. 

Quantity of nitrogen . o Aa C/3 By 
„ ,, .1 Cy Ba A fi 

,, ,, .2 B/3 Ay Ca 

To the Latin letters in the square, Greek letters 
have been added, in such a way that each appears 
once in each column, once in each row and once with 
each Latin letter. The whole thus constitutes what is 
known as a Graeco- Latin square. The fac,t that a, 
Graeco- Latin square is possible shows that the eight 
independent comparisons among nine objects can be 
resolved into four independent sets of 2 degrees of 
freedom each, each pair being the comparison between 
three sets of three chosen objects each. These are the 
two comparisons among rows, two among columns, 
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two among Latin letters, and two among Greek letters^ 

It will be observed that the comparisons among Lati 
letters have been chosen to correspond with the 
differences among the sets of blocks ; consequent y, 
the comparisons among the Greek letters are inde- 
pendent of these block differences, and like ^.he 
comparisons between rows N and column Q J 
be made by comparing yields within the same 
Jock. By Idding up the yields of all p lots having 

treatments of the combinations indicated by the 
lettem a, f and y, we may evaluate two treatment 
comparisons which have not been confounded. Two 

more are "Tilr 5 

comparisons 3 represented by NQ and NKQ are 
isolated. The 4 remaining degrees hav , 

b6 “ r— ^"oMHese four remaining coirn 
narisons with block differences is incomplete, owing 
to he fact that the blocks, which differ in respect of 
'them, agree in containing other 

with which they may be compared. ■ f 

blocks of type A, the plots with single dressings o 

notash having the chosen constitutions Pi a 2 

ar stated in the same blocks with an aggregate of 
are situ , &nd with a n aggregate 

rJlinTaJwe potash, bmh of which are the same 
as the aggregates which occur in the t ree ^ 

type B, and in the “°t U torn the treatments 

t;“^h“ r f rom the other sets of 
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treatments p Q , and s 2 , and m 0 , and /> 2) by sub 
tracting in each block the sum of the yields from 
plots without potash, and with double potash, from 
twice the yield of the plots with single potash, and 
adding together the results from blocks in which the 
plots with single potash have been treated alike. The 
manurial comparisons so made clearly represent two 
of those which have been confounded with blocks, 
but which can be made, in the manner explained 
above, by means of cdmparisons wholly within blocks, 
with a satisfactory precision. The sum of the squares 
of the coefficients of the expression 

is 6, and to this total the coefficient of k x contributes 4. 
Consequently, the comparison so made among the 
different sets of plots receiving single potash has 
two-thirds of the precision of the other com- 
parisons of the experiment, and so, perhaps, - a 
higher precision than they would have had, even 
if unconfounded, in an experiment with 27 plots to 
each block. 

The student may familiarise himself with the 
process of analysis described above by applying it to 
the yields of tubers in quarter-lbs. shown, in the 
following table of the arrangement of the treatments 
in different plots. 

The upper and lower figures represent yields with 
and without phosphate, which was applied as an 
additional manure to half, chosen at random, of each 
plot. The sums of these yields may therefore be 
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analysed as explained above, but their differences, 
representing the effects of the phosphatic manure, 


TABLE 27 


Arrangement and Yields of a Complex Experiment 
( spm ) ! {s mp)i 


751 

844 

»»*2 

733 

829 

*0*1 

686 

825 

«i • *2/1 

851 890 

800 733 

*1*2 

874 

813 

*j*l * 1 7*2 

1026 947 

1050 871 1 

*2 

990 

[OO6 

*0 

796 

705 

*2 

910 

866 

«i7*i 

909 

778 

« 0 *2'" 2 

855 1026 

779 815 

*0 7*2 
865 

816 

ft 2 W qW 2 

ms 853 
997 953 

*o7*i 

795 

843 

*1 

895 

965 

«2»*1 

1034 

1046 

n t pi 

1052 

830 

*2 

913 

752 

«0>*1 

756 

830 

*1*1 

892 

930 

*0 

1024 

979 

*1 

972 

lOOO 

«i«i, 

975 

884 

*0 

I IOO 

996 

«o'*i 

1014 

968 

*0**2 

975 

902 

*0*2 

956 

898 

*2*1 

1037 

975 

«!>*! 

1035 

1027 

*1*2 

1284 

1 176 

*0 

1012 

966 

w,Wi 

1001 

977 

»i*i 

1029 

1022 

*1 Pt 
1121 
961 

*2*2 

1252 

1127 

*0 

1058 

1006 

*0/1 

1038 

904 

n,W! 

1098 

1206 

*2 

1316 

1275 

*0*1 

II36 

1049 

*1 

1087 

1001 

• 

*i 

959 

93 ° 

n*Pi 

1178 

1102 

*2 

1317 

1145 

*2 

1270 

1 1 18 

*27*2 

1234 

H34 

*1 

1195 

1132 

* 1 ) 7*2 

1307 

1348 

*2«1 

1224 

1275 

«i7*i 

1069 

1128 

n % 

113 1 

1034 

n x m % 

1140 

1156 

«o 7 *i 

1055 

1026 

»i 

1151 

1044 

«i7*i 

1147 

1056 

*2*2 

1156 

1228 

* 27*2 

1401 

1391 

n x m % 

1214 

1321 

*0 

1005 

101 1 

»1»*1 

980 

1027 

*1*2 

1243 

*1065 

»i 

1224 

1064 

« a l*l 

1192 

1199 

*1 7*2 
1225 
1120 

*2 

1305 

1276 

*0*2 

1310 

1339 

*2 

1421 

1417 

* 0 >«i 

1190 

1208 

*0 

1020 

60S 

*o7*2 

653 

999 

*2*1 

935 

1142 

*0**2 

629 

1056 

*0*1 

947 

1049 

*0 

1020 

1102 

*1*1 

1361 

1201 

*1 

I 167 
1215 

* 27*1 

1222 

IIO8 


and its interactions, are already freed from all block 

effects, and will have their own standard error 

L 2 
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estimated directly from the discrepancies between 
these differences in plots treated alike. 
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IX 


THF INCREASE OF PRECISION BY CON- 
COMITANT MEASUREMENTS. STATISTICAL 
CONTROL 

55 . Occasions suitable for Concomitant Measurements 

In the preceding chapters we have been principally 
concerned with the means whereby expen mental 
p/ecision may be increased through the knowledge 
that groups of material may be selected, the parts o 
which are more homogeneous than are the different 
groups. We have been using such facts as that 
animals more nearly related by blood are generally 
more alike than animals less nearly related, that men 
of the same race or district are likely to be more 
similar than men of different races, that plots of lan 
resemble one another more nearly in fertility the 
closer together they lie, or that apparatus supplied by 
the same manufacturer will generally be more nearly 
comparable than the makes supplied by different firms. 
It has been shown that very great increases m precision 
are possible by utilising these and analogous facts, 
even when the amount of material which is close y 
homogeneous with any chosen unit is extreme y 
limited, provided that within this limitation, we may 
assign the treatments to be tested at will so as to bui 
up a comprehensive experiment. 

There is a second means by which precision may, 


167 



i68 


INCREASE OF PRECISION 

in appropriate cases, be much increased by the 
elimination of causes of variation which cannot be 
controlled, which has the advantage of being appli- 
cable when we cannot exercise a free choice in the 
distribution of the treatments. For example, in a 
feeding experiment with animals, where we are 
concerned to measure their response to a number of 
different rations or diets, we may often be able to 
ensure that the animals entering on the treatments to 
be tested shall be of 'the same age, and often, also, 
that they shall be closely related, or of the same 
parentage. But such groups of closely related animals 
as are available will not, at the same age, have attained 
exactly to the same size, as measured by weight, or in 
any other appropriate manner. If we decide that they 
shall enter the experiment at the same age, it may well 
be that the differences in initial weight constitute an 
uncontrolled cause of variation among the responses 
to treatment, which will sensibly diminish the precision 
of the comparisons. If the animals are assigned at 
random to the different treatments, either absolutely 
or subject to restrictions of the kinds which have been 
discussed, the differences in initial weight will not, of 
course, vitiate the tests of significance, for, though 
they may contribute to the error of our comparisons, 
they will then also contribute in due measure to the 
estimates of error by which the significance of these 
comparisons are to be judged. They may, however, 
constitute an element of error which it is desirable, 
and possible, to eliminate. The possibility arises from 
the fact that, without being equalised, these differences 
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of initial weight may none the less be measured. 
Their effects upon our final results may approximately 
be estimated, and the results adjusted in accordance 
with the estimated effects, so as to afford a final 
precision, in many cases, almost as great as though 
complete equalisation had been possible. 

Similar situations frequently arise in other fields 
of work. In agricultural experiments involving the 
yield following different kinds of treatments it may 
be apparent that the yields of the different plots have 
been much disturbed by variations in the number 
plants which have established themselves. If we are 
satisfied that this variation in plant number is no 
itself an effect of the treatments being investigated, 
or if we are willing to confine our investigation to the 
effects on yield, excluding such as flow directly 
indirectly from effects brought about by variations in 
plant number, then it will appear desirable to » ntro J u “ 
fnto our comparisons a correction which makes a 
ance, at least approximately, for the variations 
yield directly due to variation in plant number itse ^ 
In introducing such a correction it is importan 
make sure that our procedure shall not m any way 
invalidate the test of significance to be applied to the 
comparisons, and thought will often be required o 
assure ourselves that the effects eliminated shall really 
be only those which are irrelevant to the aim 

CXP l r gXlet us suppose that a number of remedial 
’treatments are to be tested on an orchard or P antatm , 
the trees of which show in varying measure the effe 
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of disease. It might be possible to grade the individual 
plants prior to the application of the treatments, and 
to apply the treatments to equal numbers of plants 
showing each grade of injury. But this would not 
always be possible, especially if it is not to individua 
plants but to small plots, each containing several 
plants, that the remedial measures must be applied. 
Such a procedure would also, in any case, necessan y 
sacrifice the advantage of propinquity of the areas 
which it is desired to compare. To meet this difficulty 
it is open to us to apply the different treatments 
to plots randomised and adequately replicated, but 
chosen without regard to the initial grade of injury 
of the plants they contain. The grades of injury ° 
these plants may, however, be recorded both mitia y 
and finally, when the treatments may be supposed to 
have exerted such effects as they are capable of, an 
the comparison of the final condition of the plots 
which have received different treatments maybe 
adjusted to take account of the degrees of injury 
initially shown by these same plots. 

With perennial plantations the same principle may 
very advantageously be applied to studies of the 
effects of manuring, pruning, and other variable 
treatments, on the yield. Yield in such cases is 
evidently much influenced, not only by variations in 
soil fertility, but also by the individual capacities o 
different plants, which, whether hereditary or not, 
persist from year to year. Records of the yield of 
individual rubber trees, or of small areas of tea- 
plantation, thus show large and relatively permanent 
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differences. In such cases records of yield for a 
preliminary period under uniform treatment provide 
a most valuable guide in interpreting the records after 
the treatments have been varied. It would in these 
cases be possible to choose areas for the different 
treatments such that their previous record was 
approximately equalised. But to do so is usually 
troublesome, inexact, and unnecessary. Moreover, 
as the plots so chosen cannot also be arranged in 
compact blocks, or in other advantageous arrange- 
ments, such as the Latin square, the loss of precision 
due to sacrificing this advantage is often consider- 
able. It is now usual, therefore, to arrange the plots 
in some way which is topographically advantageous, 
irrespective of their previous records, and to utilise 
the information supplied by these as an adjustment 
or correction to the subsequent yields measured under 
varying treatments. It may be noted, however, that 
with annual agricultural crops, knowledge of the yie s 
of the experimental area in a previous year under 
uniform treatment has not been found sufficient y 
to increase the precision to warrant the adoption o 
such uniformity trials as a preliminary to projected 
experiments. Such a procedure necessarily near y 
doubles, the experimental labour and, as it is not 
found to double the amount of information supplied 
by the experiment, but to increase it, perhaps by 
50 per cent., it is clearly unprofitable. For, by the 
application of twice the expenditure in time and money 
m the experimental year, the amount of information 
recovered may with confidence be expected to be 
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approximately doubled. Consequently, on grounds 

precision alone, such preliminary ‘" a ^ “ ev 

crops are not to be recommended. The fact that th y 
entail at least a year’s delay in the experimental results 
adds to the force of this conclusion. 

In many cases it may be possible to take two 
or more concomitant measurements, each of which 
severally may be expected, when proper allowance is 
made for it, to increase the precision of the comparisons 
to be made, and which, if used jointly, may increase 
them still further. -Thus, if groups of school children 
be supplied, in addition to their home diet with a ration 
of milk, either raw or pasteurised, children in the 
same school may be assigned properly at random .to 
the groups receiving these different additions to their 
diet With large numbers of subjects the age distribu- 
tions of the two groups may be very nearly equalised, 
but with the smaller numbers attending a particu ar 
school such equalisation of age will necessarily be 
somewhat inexact, and, apart from age > xt is certain 
that the two groups of children will differ somewhat 
in the initial height and weight. The variations in 
these initial values, moreover, may all be suspected 
of having, possibly, an appreciable influence on the 
apparent response to the nutrients as meastyed by 
increments in height or weight. The most thoroug 
procedure for such a case would be to eliminate 
make allowance for, all these three variables jointly ; 
and, though it might not in fact be necessary, to take 
account of more than two, or of even one of them 
we could only assure ourselves that such a simple 
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procedure was in reality effective by examining the 
effects of making allowance for all three jointly. 


56. Arbitrary Corrections 

In the examples outlined above, in which an 
observable but uncontrolled concomitant might reason- 
ably be expected, if proper account can be taken of it, 
to add to the precision of the results, it is still a common 
practice to introduce corrections based on « prion 
grounds, without reference to what the data them- 
selves have to tell of the amount of the corrections 
to be applied. Thus in a feeding experiment with 
animals it might be thought proper to take account 
of the variation in their initial weights by calculating 
the responses of different individuals, not by their 
absolute increases in weight, but by their increase 
relative to their initial weight, or as percentage 
increases. Equally, in allowing for the effect o 
variation in plant number upon an agricultural yiel 
it is possible, and has sometimes been thought 
appropriate, to calculate the yield per plant in p ace 
of the yield per unit area, as the measure of the 
efficacy of the treatments to be compared. In judging 
of the effects of treatments on the grade of visible 
damage, caused by disease it might be thought 
'sufficient to compare the differences between the 
average grades of the different plants receiving 
any treatment, before and after that treatment has 
been applied, in order to allow for the fact that t e 
areas differently treated, though assigned proper y 
at random, were not initially in exactly the same 
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condition. When allowing for the differences between 
different plots observed in a preliminary tria o a 
plantation, either the proportional system, or t at 
dependent on simple differences might equally be 
advocated, and would, perhaps, not give grea y 
different results. 

Such a priori methods of allowing for concomitant 
variates, and attempting to utilise them to increase 
the precision of experimental comparisons, should not 
be rejected as invalid', even though we may know that 
the suppositions on which they are based are experi- 
mentally untrue. The experimenter, for example, has 
a perfect right to measure the efficiency of differen 
feeding stuffs, either by the average percentage 
increase of different animals, or by the average 
absolute increase, as he pleases, and, with a proper . y 
designed experiment, he will ascertain w et er e 
materials tested do or do not give s.gmfkantly 
different results as measured in these alternate ways. 
He has this right, none the less, even if expert, nen.s 
with a uniform feeding mixture, and annuals 
varying initial weight, have shown that the movements 
in weight during the experimental per.od ne.ther are 
independent of initial weight nor are proport, onal to 
it What such experiments would make cleat is tha , . 
for the purpose of detecting differences between the 
feeding stuffs tested, with the greatest possible prec- 
sion in relation to the size of the trial ne.ther method 
of measuring weight increase is .deal and that bot 
are capable of some improvement. If, for examp e, 
in experiments in the course of wh.ch the average 
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weight of the animals had doubled, it was found that 
an initial difference in weight of i lb. was followed 
at the end of the experiment by a difference on the 
average of i* lbs., it is obvious that an allowance 
on this scale would be preferable, for the purpose of 
comparing different feeding stuffs, either to an allow- 
ance of a pound for a pound, as is the effect of 
taking simple weight increases without regard to the 
initial weight, or to an allowance of 2 lbs. for . 
which would be approximately the effect of judg g 
the experimental results by the proportional increases 

in weight. .. 

Preliminary investigations of the correct allowance 

to make for concomitant variates are usually wanting, 
and are, fortunately, not a practical necessity, for the 
results of a replicated experiment may themse ves 
used to supply What is wanted. Let us i suppose t 
five feeding stuffs are to be tested, each on ten pigs, 
the animals being assigned to the different rations 
entirely at random. The average initial weights o 
the groups assigned to the different feeding stuffs w> 
therefore vary somewhat by chance, t oug 
variation will not be so great as the variation between 
the initial weights of the different animals receiving 
the same feeding stuff. The assignment being at 
random in fact gives an assurance that the aver g 
differences between the different lots of ten shall be 
smaller than the individual differences in the ratio 
, ■ V.o or, in fact, should be rather less than one- 
third as great. A direct comparison, within the 
groups receiving the same mixture, of the extent to 
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which greater initial weight is followed by greater 
final weight, will, therefore, generally supply an 
estimate of the true allowance to be made of amp y 
sufficient accuracy for the small adjustments which 
are to be based upon it. Moreover, such an allowance 
based on the very same data to which it is to be 
applied, is generally preferable to one based on other 
experiments, even if these are much more extensive 
since it is certain that the conditions in which different 
experiments are made vary greatly, and in many 
unknown and uncontrolled ways. We have no assur- 
ance that the allowance appropriate to one set o 
conditions, or to one type of material shall still be 
even approximately appropriate when the conditions 
and the material are varied. Consequently, even if the 
appropriate allowance for each concomitant variable 
had been previously ascertained by sufficiently exten- 
sive experimentation, it would still be advantageous 
to rely, in each particular case, on the internal evidence 
of the experiment in question. It may also be noted 
that by doing so the experiment conserves its property 
of being self-contained, and, therefore, adequate to 
supply genuinely independent testimony on any point 
in dispute, and that such complete independence is 
attenuated, if not lost, if extraneous data are introduced 
in the process of its interpretation. 


57. Calculation of the Adjustment 
The process of calculating the average apparent 
effect on the experimental value of the increase of one 
unit of the concomitant measurement is, m princip e, 
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extremely simple. Statistically, such values fall into 
the class of what are known as “ regression co- 
efficients,” and the variety of methods, appropriate 
to calculating such regressions, forms an extensive 
subject, which is treated more fully in the author s 
book , Statistical Methods for Research Workers. To 
illustrate the principle used, the detailed working for 
a simple case will be given here, from which the 
reader who is unfamiliar with regressions will be 
able to see exactly what the calculation amounts to, 
though a fuller study would be needed to recognise 
how the operations should best be carried out in 
particular cases. We will suppose that five feeding 
mixtures are being tested in respect to the live weight 
increase produced by them, between fixed limits of 
age, on groups of ten pigs, assigned at random to 
each of the five mixtures. If no account whatever 
were to be taken of the initial weight (*), we might 
deal with the final weights (y) as follows The ten 
final weights for each treatment are added to give 
totals corresponding to each treatment (A, B, C, D, E), 
and divided by 10 to give the corresponding mean 
values (a. b, c, d, e). To judge of the significance of 
the differences between these totals, or between these 
means, we must make an estimate of the magnitude 
of the variations due to uncontrolled causes, including 
initial weight, and this we may do by examining 
the variation in final weight among pigs fed with 
the same mixture. Each set of ten pigs treated 
alike will supply 9 degrees of freedom for this 
purpose, or 45 degrees of freedom in all, Jor the 
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r error The sum of squares, corre- 
estimation of e " or , of fre edom, is found by 

sponding to each 9 g , j- _ squares 

squaring the ten final weights ad g h 
and deducting A a, the product of the total a 

weight for the — ofMon for 

squares corresponding th 4 g by 

variance among treatments - Ukewise ^ ^ ^ 

adding together the P rodu " ts the oduct G f 

the several treatments, and deducting tn P 
the grand total and the general mean, t.e., Y 

A«+B5+Cr+D^+E^-M^; 

where M stands for the grand total and « outlie 
general mean. The analysis of variance 
the simple form 2g 


Treatments 
Error . 


Total 


Degrees of 
Freedom. 

4 

. 45 

49 


Obviously, an exactly sOTilar analysis c ^ ^ 

made of *=. ( \ ^ ^ however , 
direct experimental inters • or der 

the first step to take in utilising the values* ^ 

» fi na. 7'^ “ " feeding 

-berf, and , 

set of 9 

— for 
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the ten pigs treated alike, and deduct the product of 
the initial total and the final mean, or, what comes to 
the same thing, of the initial mean and the final total. 

We thus have a sum of products for the 45 degrees 
of freedom ascribed to error, comparable in every 
respect with the sum of squares belonging to the 
same degrees of freedom for the initial weights, or 
for the final weights. Equally, for the 4 degrees of 
freedom ascribed to treatments we may find the 
appropriate sum of products by multiplying the initial 
tot^l weight for any treatment by the final mea 
weight, adding the five products so obtained an 
deducting the initial total weight of all the pigs, 
multiplied by their final mean weight. 

The three corresponding tables derived rom t e 
squares of the final weights, the squares of the initia 
weights and the products of the two senes contain all 
that is needed for the adjustment of the final weig , 
and for the further study of the adjusted values. In 
particular, the appropriate adjustment to be subtracted 
from each final weight to allow for each addition 
pound in initial weight, as judged from the internal 
evidence of the experiment, by a comparison among 
pigs treated alike, is found simply by dividing the 
error tewn of the sum of products by the corresponding 
term in the sum of squares of initial weights. 

This procedure is of quite general application. 
If for example, the experiment had been of a more 
intricate design we might have chosen sets of five 
pigs each, from ten different litters, and assigned one 
pig of each litter to each treatment, so that the 
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treatments should be tried on animals of more nearly 
equal genetic constitution than if a lot of fifty had 
been distributed wholly at random. The analysis 
would then have taken the form 

TABLE 29 

Degrees of 
Freedom. 

Litters ... - 9 
Treatments ... 4 

Error 

Total • 49 

The differences between litters being thus eliminated 
from the experiment, both in the effects of treatment 
and in the estimation of error, we should in conse- 
quence derive the adjustment by dividing the error 
component of the sum of products by the correspond- 
ing component in the sum of squares in initial weights, 
because it is now only the relation between initial and 
final weight among pigs of the same litter that is 
wanted in adjusting the results. We may, therefore, 
in all cases, obtain the empirical adjustment, indicated 
by the particular results of the experiment, by dividing 
the error component of the sum of products by that 
of the sum of squares of the concomitant observation. 

In cases in which it is desired to make allowance 
simultaneously for two or more concomitant measure- 
ments, separate analyses in the same form should be 
made for each of these, and for the sum of products 
of each with the dependent variate to be adjusted, 
and with each other. The error terms of these tables 
will then provide a system of two or more linear 
equations in accordance with the general procedure 
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of partial regression, the solutions of which represent 
the average effects of unit changes in the several 
independent variates. The principle of the adjust- 
ment is thus exactly the same whether we have to do 
with one concomitant variate or with many, and the 
use of two or more such concomitants involves no 
unmanageable increase in the labour of computation. 
The limiting factor in the utility of concomitant 
observations lies rather in the # labour of additional 
measurements, which may not, even when the best 
possible use is made of them, lead to so great an 
increase in precision as could be obtained by increas- 
ing the size of the experiment on a simpler plan, or, 
in other ways, by the expenditure of an equivalent 
amount of time and attention. In cases, however, as 
with the initial weights of experimental animals, where 
the measurement to be used as a concomitant is one 
which would not in any case be omitted, the precision 
which can be gained by a direct evaluation of their 
actual effects is entirely profitable to the experiment. 

58 . The Test of Significance 

We have now to evaluate this gain in precision so 
that the significance of the responses to different 
treatments may be tested after adjustment. Since the 
adjustment has been obtained from the error term 
we may regard i of the degrees of freedom ascribed 
to error as having been utilised in evaluating it. 
Supposing, that is, that only one concomitant variate 
has been used and, therefore, only one coefficient has 

been evaluated. In general the number of degrees of 

M 2 



,82 INCREASE OF PRECISION 

freedom utilised is equal to the number of concomitant 
variates. After allowing, therefore, for the imtia 
weights of the animals in our experiment there will 

remain only 44 degree* ° f freedom f ° r ‘ /^Tllvat 
of error if the animals have been assigned wh y 
random, and only 35 degrees of freedom if they have 
been assigned at random withm the litters. T 
deduction to be made from the sum of squares ascribed 
to error, in the analysis of the final weights due o 
the removal of this i degree of freedom, is eas y 
calculated. It consists of the square of the error 
component in the analysis of covariance, divided by 
the error component of the analysis of variance of 
the initial weights. After deducting this portion 
sum of squares ascribed to error may be divided by 
the degrees of freedom, to obtain the mean square 
appropriate to testing the significance of differences 
among the adjusted final weights. This use is entire y 
appropriate only if, as should be the case m a properly 
randomised experiment, the differences among the 
mean initial weights of the different groups of animals 
are small compared with the differences amongst 
animals of the same group from which the adjustmen 
has been evaluated. If this were not so the adjus e 
values would in some measure be also affected by e 
errors of estimation of the value of the adjustmen 
applied. It is, therefore, a useful resource to apply a 
test of significance to the adjusted values, or any 
component of them of special interest, which shall 
take full account of the inexactitude of our estimate 
We may illustrate the procedure for the case in which 
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sets of five pigs from ten different litters have been 
assigned at random to the five feeding mixtures. 

In this case 9 degrees of freedom representing 
differences between litters have been eliminated from 
the experimental comparisons, and from the estimate 
of error. With these we are no longer concerned. 
The sum of squares corresponding to the 35 degrees 
of freedom for the estimation of error, after adjust- 
ment, has been evaluated by means of deducting the 
square of a term from the analyses of covariance, 
divided by the corresponding term in the analysis of 
variance of the initial yields. The same process is 
now applied using the sum of the components for 
treatment and error from the same tables. This gives 
us the sum of squares corresponding to 39 degrees 
of freedom, for 1 has been deducted from the 40 
originally available. Subtracting now the portion 
obtained for error, the difference represents the 
4 "degrees of freedom ascribable to treatments, after 
exact allowance has been made for the sampling error 
of the coefficient used in their adjustment. The sum 
of squares for these 4 degrees of freedom may, there- 
fore, be compared with that for the 35 degrees of 
freedom due to error, as in an ordinary analysis of 
variance, in which no concomitant variate has been 
eliminated. The sum of squares ascribed to treatment 
by this method will be found to be somewhat less than 
the corresponding value derived from the adjusted 
means and totals, although these adjusted values are 
’the best available from the experiment, only because 
a calculable portion of the variance among them is 
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ascribable to the sampling error of the estimated 
rate of allowance, which portion it is proper to remove 
in making an exact test of the significance of the 
variation observed. In cases where the concomitan 
variation has not been properly randomised, the 
omission of this precaution may lead to serious errors, 
but in such cases the possibility of testing significance 
accurately is always questionable. 

59. Practical Example 

Table 30 shows the arrangement of an experinient, 
carried out in sugar-beet, at Good Easter, Chelmsford 
in 1932, by the National Institute of Agricultural 
Botany. Three varieties, a, b and c, are tested in com- 
bination with eight manurial mixtures, respectively 
containing and lacking sulphate of ammonia, at the 
rate of o-6 cwt. of nitrogen per acre, superphosphate 
at the rate of 0 5 cwt. P 2 0 5 per acre, and chloride of 
potash at the rate of o- 75 cwt. K z O per acre. The 
twenty-four combinations of the three varieties wit 
the eight manurings are arranged in four randomise 
blocks in the order shown. In each plot the rst 
number is the number of plants lifted, while the 
second is the weight in pounds of washed roots. 

In carrying out the analysis it should be observed 
that the varieties show significant differences in plant 
number, so that the yields in root weight adjusted Im- 
plant number will not necessarily represent varietal 
differences in yield under any uniform system of field 
treatment, but should represent yield differences for 
equal plant establishment. 



TABLE 30 

Arrangement, Plant Number and Yield, of Combined Manurial 
and Varietal Experiment with Sugar-beet 
( Rothamsted Experimental Station Report, 1932) 
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When variation in plant number is not large a 
proportional allowance, based on a simple regression 
coefficient, is often entirely adequate Theoretica y, 
however, we should not expect the relationship 
between yield and plant number to be represente 
by a straight line over a wide range, but rather by 
a curve, having a maximum within or outside the 
range of the observations. To deal with curved 
regression, when it seems to be advantageous, it is 
only necessary to introduce not only the plant number 
but its square also as a second concomitant observa- 
tion, and to treat these exactly as though they were 
two independent variables. The fact that one o 
these may be calculated from the other does not in 
any respect interfere with their use in this way, and, 
of course, in special cases, the same principle may be 
used to introduce more complicated curves. Soun 
judgment as to the probable value of such elaborations, 
in comparison with the work required, can only be 
gained by trying them on bodies of actual observations, 
such as those shown in the table. 
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THE GENERALISATION OF NULL HYPOTHESES. 

FIDUCIAL PROBABILITY 

60. Precision regarded as Amount of Information 

The foregoing Chapters, III. to IX., have been 
devoted to cases to which the theory of errors is 
appropriate. That is to say, to cases in which the 
experimental result sought, is found by testing the 
significance of the deviations shown by the observa- 
tions, from a null hypothesis of a particular kind. In 
this kind of hypothesis all discrepancies classified as 
error, and not eliminated from our comparisons by 
equalisation or regression, are due to variation, in the 
material examined, following the normal law of errors 
with a definite and constant, but unknown, variance. 

Granting the appropriateness of null hypotheses 
of this kind, our purpose has been to diminish the 
magnitude of the error components in the comparisons, 
and a number of devices have been illustrated by 
which this can be done, while at the same time the 
requirement can be satisfied that the experiment shall 
supply a valid estimate of the magnitude of the 
residual errors, by which the comparisons are still 
affected. In general, it has been seen that, with 
repeated experimentation on like material, the variance 
ascribable to error falls off inversely to the number of 
replications, so that in measuring the effectiveness of 

187 
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methods of reducing the error, an appropriate scale 
is provided by the inverse of the variance, or 
invariance , as it is sometimes called, of the averages 

determined by the experiment. 

If, therefore, any such average is determined with 
a sampling variance V, we may define a quantity 
such that I = i/V, and I will measure the quantity 
of information supplied by the experiment in respect 
of the particular value to which the variance refers. 
Information, of course, like other quantities, may be 
measured in units of different sizes, according to .the 
subject under discussion. Thus, with agricultural 
yields, it is convenient to consider an experiment 
giving a standard error of io per cent, as supplying 
one unit of information. One giving a standard error 
of only 5 per cent, will, therefore, supply four units. 
An experiment with a standard error of 2 per cent, 
will yield twenty-five, and one with a standard error 
of 1 per cent, will yield a hundred of such units. 
The amount of information is thus measurable on a 
scale inverse to the variance, or inverse to the square 

of the standard error. , f 

One immediate consequence of this method o 
evaluation is that when an experimental programme 
is enlarged by simple repetition on like material, the 
amount of information gained is proportional to the 
labour and expense incurred. Consequent y, we may 
ascertain the cost, per unit of information gained 
of any type of experimentation of which we have 
adequate experience ; or, if we wish, using the data 
from any single large experiment. The cost o 
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attaining any desired level of precision, or of gaining 
any desired amount of information by the same method 
is thus easily calculable. What is more important, 
the relative costliness of different methods of experi- 
mentation may be directly compared, and the saving 
effected by improved methods of design, or by the 
use of concomitant observations, may be given an 
entirely objective and tangible value. 

In such calculations it is important that the items 
of labour and skilled supervision chargeable to a 
particular method of experimentation, shall be fairly 
and carefully recorded and calculated. For any time 
and labour devoted to experimental work must be 
regarded as having been diverted from other work 
of scientific value, to which they might otherwise 
have been given. Even rough costings of this kind 
will usually show that the efficiency with which 
limited resources can be applied, is capable of relatively 
enormous increases by careful planning of the experi- 
mental programme, and there is nothing in the nature 
of scientific work which requires that the allocation 
of the resources to the ends aimed at should be in any 
degree rougher, or less scrupulous, than in the case 
of a commercial business. The waste of scientific 
resources in futile experimentation has, in the past, 
been immense in many fields. One important cause 
at least of this waste has been a failure to utilise past 
experience in evaluating the precision attainable by 
an experiment of given magnitude, and in planning to 
work on a scale sufficient to give a practically useful 
result. 
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A serious consequence of the neglect to make 
systematically estimates of the efficiency of different 
methods of experimentation is the danger that satis- 
factory methods, or methods which with further 
improvement are capable of becoming satisfactory, 
may be overlooked, or discarded, in favour of others 
enjoying a temporary popularity. Fashions in 
scientific research are subject to rapid changes. Any 
brilliant achievement, on which attention is tem- 
porarily focused, may give a prestige to the method 
employed, or to some part of it, even in applications 
to which it has no special appropriateness. The 
teaching given in universities to future research 
workers is often particularly unbalanced in this 
respect, possibly because the university teacher cannot 
give his whole time to the study of the practical 
aspects of research problems, possibly because he un- 
wittingly emphasises the importance of the particular 
procedures with which he is best acquainted. 

61. Multiplicity of Tests of the same Hypothesis 

The concept of quantity of information is appli- 
cable to types of experimentation and of observational 
programmes other than those for which the theory of 
errors supplies the appropriate null hypotheses. < Before 
considering these, as will be done in the following 
chapter, it is advisable to consider a somewhat more 
elaborate logical situation than that introduced in 
Chapter II. It was there pointed out that, in order 
to be used as a null hypothesis, a hypothesis must 
specify the frequencies with which the different results 
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of our experiment shall occur, and that the interpreta- 
tion of the experiment consisted in dividing these 
results into two classes, one of which is to be judged 
as opposed to, and the other as conformable with the 
null hypothesis. If these classes of results are chosen, 
such that the first will occur when the null hypothesis 
is true with a known degree of rarity in, for example, 

5 per cent, or 1 per cent, of trials, then we have a 
test by which to judge, at a known level of significance, 
whether or not the data contra'dict the hypothesis to 
be tested. 

We may now observe that the same data may 
contradict the hypothesis in any one of a number of 
different ways. For example, in the psycho-physical 
experiment (Chapter II.) it is not only possible for 
the subject to designate the cups correctly more often 
than would be expected by chance, but it is also 
possible that she may do so less often. Instead of 
using a test of significance which separates from the 
remainder a group of possible occurrences, known to 
have a certain small probability when the null hypo- 
thesis is true, and characterised by showing an excess 
of correct classifications, we might have chosen a test 
separating an equally infrequent group of occurrences 
of the -opposite kind. The reason for not using this 
latter test is obvious, since the object of the experiment 
was to demonstrate, if it existed, the sensory dis- 
crimination of a subject claiming to be able to 
distinguish correctly two classes of objects. For this 
purpose the new test proposed would be entirely 
inappropriate, and no experimenter would be tempted 
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to employ it. Mathematically, however, it is as valid 
as any other, in that with proper randomisation it is 
demonstrable that it would give a significant result 
with known probability, if the null hypothesis were 
true. 

Again, in Darwin’s experiment on growth rate 
discussed in Chapter III., it has been shown that the 
test of significance using “ Student’s ” t is appropriate 
to the question with a view to which the experiment 
was carried out. Many other tests, however, less 
appropriate in this regard, or quite inappropriate, 
might have been applied to the data. Such tests may 
be made mathematically valid by ensuring that they 
each separate, for purposes of interpretation, a group 
of possible results of the experiment having a known 
and small probability, when the null hypothesis is true. 
For this purpose any quantity might have been 
calculated from the data, provided that its sampling 
distribution is completely determined by the null 
hypothesis, and any portion of the range of distribution 
of this quantity could be chosen as significant, provided 
that the frequency with which it falls in this portion of 
its range is -05 or -oi, or whatever may be the level 
of significance chosen for the test. 

Some such tests would be of no interest. in any 
circumstances with which experimenters are familiar. 
Others, though not appropriate to the object Darwin 
had in view, might be appropriate to an experimenter 
studying a different subject. Thus, if the aim of the 
experiment had been, not to ascertain whether the 
average height of the cross-fertilised plants was, or 
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was not, greater than that of the self-fertilised, but 
whether the difference in height between the cross- 
and self-fertilised plants of any pair was distributed 
normally, or in an unsymmetrical distribution, a valid 
test appropriate to this point could be devised. In 
addition to calculating, as in Chapter III., the sum 
of the squares of the deviations of these differences 
from their mean, we might calculate the sum of the 
cubes of these differences, having regard to their signs, 
and the ratio of the latter sum* to the former raised 
to the power of 3/2, may be shown, on the null 
hypothesis, to have a determinate distribution for a 
given number of pairs of plants. T he exact form of 
this distribution is at present unknown, since the 
distributional problem here considered is not one of 
those that have been solved. Nothing, however, but 
lack of mathematical knowledge, prevents us from 
stating exactly outside what limits the ratio must lie, 
to 'have a given level of significance. This test would 
pick out as statistically significant quite different sets 
of experimental results from those selected by the . 
test. It is in no sense a substitute for that test, or 
suited to perform the same functions. It is designed 
to answer a different question, although in both cases 
the question is answered by selecting a group of 
possible experimental results deemed to contradict 
the same null hypothesis. They may properly be 
thought of as testing different features of this hypo- 
thesis. The hypothesis tested in both cases states 
that the distribution of differences in height is centred, 

and is normal in form. The one test is 

N 


at zero 
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appropriate when we are interested especially in the 
possibility that it is not centred at zero. In this case 
the question of normality is, as has been shown, of 
quite trivial importance. The other is appropriate 
when we are interested in the possibility that the 
distribution is skew, or unsymmetrical about its mean, 
and in this case, the value of the mean is entirely 
irrelevant. 

d he notion that different tests of significance are 
appropriate to test different features of the same null 
hypothesis presents no difficulty to workers engaged 
in practical experimentation, but has been the occasion 
of much theoretical discussion among statisticians. 
The reason for this diversity of view-point is perhaps 
that the experimenter is thinking in terms of observa- 
tional values, and is aware of what observational 
discrepancy it is which interests him, and which he 
thinks may be statistically significant, before he 
enquires what test of significance, if any, is available 
appropriate to his needs. He is, therefore, not 
usually concerned with the question : To what 
observational feature should a test of significance be 
applied ? This question, when the answer to it is not 
already known, can be fruitfully discussed only when 
the experimenter has in view, not a single null 
hypothesis, but a class of such hypotheses, in the 
significance of deviations from each of which he is 
equally interested. We shall, later, discuss in more 
detail, the logical situation created when this is the 
case. It should not, however, be thought that such 
an elaborate theoretical background is a normal 
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condition of experimentation, or that it is needed for 
the competent and effective use of tests of significance. 

62. Extension of the t Test 

In hypotheses, based on the theory of errors, there 
is, however, one extension which is normally held in 
view, and which, for the great simplicity of its conse- 
quences, is well fitted to introduce the more complex 
situations in which methods of statistical estimation 
require to be discussed. I n Chapter 1 1 1 . we illustrated 
“ Student’s ” t test of significance with Darwin’s data 
on the growth of young maize plants. The hypothesis 
to be tested was that the difference in height, between 
the cross-fertilised and the self-fertilised plant of the 
same pair, was distributed in some normal distribution 
about zero as its mean. We might, however, have 
considered a similar hypothesis, giving to the mean 
difference any other number, positive, negative, or 
fractional, of inches. If, instead of testing whether 
or not the mean could have been zero we had chosen 
to test, whether or not it had any unspecified value, 
H, measured in eighths of an inch, then the deviation 
of our observed mean, 24-93 from the hypothetical 
value, /a, is 

24-93-f* 

and this quantity, on the hypothesis to be tested, will 
be distributed normally about zero with a standard 
deviation, of which we have an estimate based 
on 14 degrees of freedom, the value of which is 
9-746. 
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Consequently, if 

_ 24 93— 

9-746 

then t will be distributed in the distribution given by 
« Student ” for 14 degrees of freedom, a distribution 
which is known with exactitude independently of 
the observations. We have the important logical 
situation, in which a quantity, t, having a sampling 
distribution known with precision, is expressible in 
terms of an unknown and hypothetical quantity, /x, 
together with other quantities known exactly # by 
observation. We say known exactly, because the 
mathematical relations stated are true of the actual 
values derived from the observations, and not of the 
hypothetical values of which they might be regarded 
as estimates. Such actual values derived from the 
observation are distinguished by the term statistics, 
from the parameters, or hypothetical quantities intro- 
duced to specify the population sampled. 

An important application, due to Masked, is to 
choose the values of t appropriate to any chosen level 
of significance, and insert them in the equation. Thus 
t has a 5 per cent, chance of .Lying outside the limits 
± 2-145. Multiplying this value by the estimated 
standard deviation, 9-746, we have 24-90 and may 


write 


p = 24-93 ± 24-90 
= 0-03, or 49-83 


as the corresponding limits for the value of n- 

One familiar way of viewing this result is, that the' 
experiment has provided an estimate, 24-93 eighths 
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of an inch, of the average difference, /x, between the 
heights of two sorts of plants. That this estimate 
has a standard error 9-746, and that “ Student’s ” 
distribution shows that, for 14 degrees of freedom, 
the 5 per cent, level of significance is reached when we 
pass outside the limits ±2-145 times the standard 
error. An alternative view of the matter, is to consider 
that variation of the unknown parameter, /x, generates 
a continuum of hypotheses, each of which might be 
regarded as a null hypothesis, which the experiment 
is capable of testing. In this case the data of the 
experiment, and the test of significance based upon 
them, have divided this continuum into two portions. 
One, a region in which /x lies between the limits 
0-03 and 49-83, is accepted by the test of significance, 
in the sense that values of /x within this region are not 
contradicted by the data, at the level of significance 
chosen. The remainder of the continuum, including 
all values of /x outside these limits, is rejected by the 
test of significance. 

It can now be seen, that the t test is not only 
valid for the original null hypothesis that the mean 
difference is zero, but is particularly appropriate to an 
experimenter, who has in view the whole set of 
hypotheses obtained by giving fx different values. 
The reason is that the two quantities, the sum and the 
sum of squares, calculated from the data, together 
contain all the information supplied by the data, 
concerning the mean and variance of the hypothetical 
normal curve. Statistics possessing this remarkable 

property are said to be sufficient , because no others 
v r n 2 
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can, in these cases, add anything to our information. 
The peculiarities presented by t, which give it its 
unique value for this type of problem are : 

(i) Its distribution is known with exactitude, 

without any supplementary assumptions 
or approximations. 

(ii) It is expressible in terms of the single 

unknown parameter, /x, together with 
known statistics only. 

(iii) The statistics involved in this expression 

are sufficient. 

63. The x 2 Test 

What is meant by choosing a test of significance 
appropriate to a special purpose, may now be illus- 
trated by considering what should be done if the 
experimenter were interested, not in whether the mean 
of the distribution could exceed a given value, or 
could lie in a given range, but in the value of the 
variance of the same distribution. What is now 
needed is a test of significance provided by a quantity 
(i) having a precisely known distribution, and (ii) 
expressible in terms of the unknown variance, <£, of 
the distribution sampled, together with sufficient 
statistics only. 

Now, if x 2 l n is the rat '° var i ance > as 

estimated from the sample for n degrees of freedom, 
to the true variance, <}>, it is known that x 2 is 
distributed, independently of the mean and variance 
of the population sampled, in a distribution which is 
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known when n is known. If we wish to set a probable 
upper limit to the value of 4 >, we note that for n — 14, 

X 2 is less than 6-571 * in only 5 per cent, of trials. 
Putting this value for x 2 i n the equation, 

, _ 19945 

X “ 4 > ’ 

we have 

<f> = 3035 - 

In other words, variances exceeding 3035, or 
standard deviations exceeding ‘55-09, are rejected at 
the 5 per cent, level of significance. 

Equally, had we wished to set a probable lower 
limit to the value of <f> we should have noted that x 2 
exceeds the value 23-685 in only 5 per cent, of trials. 
Consequently, the rejection of the 5 per cent, of values 
of x 2 which are highest will exclude values of less 
than 19945/23-685, or 844. We may thus reject values 
of the variance below 844, or of the standard deviation 
below 29-06, at the 5 per cent, level of significance. If, 
however, we rejected values for the standard deviation 
both below 29-06, and above 55-09, we should be 
rejecting both of two sets of contingencies each having 
a probability of 5 per cent., and so should be working, 
not at the 5 per cent., but at the 10 per cent, level 
of significance. If he wishes to work at the 5 P er cent, 
level, the experimenter has the choice, according to 
the purpose of his researches, 

(i) of ascertaining an upper limit for the 
unknown variance without rejecting any 
lower values, 

For Table of x ! see Statistical Methods for Research Workers, Table III. 
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(ii) of ascertaining a lower limit without rejecting 
any higher value, or 

(iii) of ascertaining a pair of limits beyond 
which values are rejected, representing 
two frequencies totalling 5 per cent, 
together. 

The tests appropriate for discriminating among 
a group of hypothetical populations having different 
variances are thus quite distinct from those appro- 
priate to discriminating among distributions having 
different means. Within the limits of the theory 
of errors, the mean and the variance are the only 
two quantities needed to specify the hypothetical 
population. It is the circumstance that statistics 
sufficient for the estimation of these two quantities 
are obtained merely from the sum and the sum of 
squares of the observations, that gives a peculiar 
simplicity to problems for which the theory of errors 
is appropriate. This simplicity appears in an alter- 
native form of statement, which is legitimate in these 
cases, namely, statements of the probability that the 
unknown parameters, such as and <f> should lie 
within specified limits. Such statements are termed 
statements of fiducial probability, to distinguish them 
from the statements of inverse probability, by which 
mathematicians formerly attempted to express the 
results of inductive inference. Statements of inverse 
probability have a different logical content from 
statements of fiducial probability, in spite of their 
similarity of form, and require for their truth the 
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postulation of knowledge beyond that obtained by 
direct observation. 

In the discussion above of the results of Darwin’s 
experiment, instead of saying that at the 5 per cent, 
level of significance, we should reject hypothetical 
variances exceeding 3035, it would be equivalent to 
say that the fiducial probability is 5 per cent, that 
the variance should exceed 3035. Equally, the fiducial 
probability is 5 per cent, that it should be less than 
844 ; consequently, it is 10 per cent, that it should 
lie outside the range between these two numbers. 
With respect to the mean, it may in the same way be 
said that it has a fiducial probability of 2^ per cent, 
of being less than 0-03, or of being greater than 
49-83, and, in the same sense, a probability of 
95 per cent, of lying within these fiducial limits. 

64. Wider Tests based on the Analysis of Variance 

In the more general type of problem, to which the 
z test is applicable, and in which we may have an 
analysis of the variance into a considerable number 
of subdivisions, we have a wide choice of tests of 
significance, each appropriate to answering a different 
question. Logically, these questions refer to the 
acceptance or rejection of different hypotheses or sets 
of hypotheses, and it will be useful to discuss them 
explicitly from this point of view. The practically 
useful variations are those which concern the hypo- 
thetical means of the different classes of observations 
which have been made. 

As an example let us consider the 6x6 Latin 
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square, of which numerical observations were given 
in Chapter V. The arithmetical analysis obtained 
by the method there described is set out below. 


TABLE 31 



Degrees of 
Freedom. 

Sum of 
Squares. 

Mean 

Square. 

i Log, 

Rows 

5 

54,199 



Columns 

5 

24,467 



Treatments 

. 5 

248,1 80 

49,636 

1-9524 

Error 

20 

30-541 

1,527 

0*21 17 

Total . 

1 

35 

357,387 


1 -7407 

9 


It will be seen that the value obtained for z was 
1 -7407. The 1 per cent, level for 5 degrees of freedom 
against 20 is -7058. Consequently, the data very 
significantly contradict the hypothesis that all treat- 
ments were giving the same yield. We might, if it 
seemed appropriate, go further and say that if £ 
stands for the true value of which z is an estimate, 
then all hypotheses which make £ less than 1-0349 
are contradicted by the data at the 1 per cent, level 
of significance. The hypothesis that the treatments 
do not affect the yield makes i = o. The wider 
hypothesis that the yields produced by the different 
treatments are a random sample from a .normal 
distribution, will provide an indeterminate positive 
value for £. If £ were 1-0349 the mean square 
ascribed to treatments would be 7-923 times that 
ascribed to error. Since the mean square ascribed 
to treatments includes also the variability due to, 
sampling error, the portion due to the effects of 
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treatments themselves cannot be less than 6-923 times 
as great as the variance due to error in our estimates 
of the mean yields from the several treatments. 

The mean square due to error has been found to 
be 1527, and this is the variance ascribable to error 
of a single plot. Dividing by 6 we find that the 
variance of the mean of six plots is 254-5. Multiplying 
this by 6-923 we have 1761-9 as the least admissible 
value for the variance due to treatments. The 
standard deviation corresponding to this variance 
is 41-97, or just over 9 per cent, of the mean yield, 
462-75, observed in the experimental plots. It may 
be noticed that, apart from the inappropriateness, in 
the present instance, of the hypothesis that the 
treatment effects constitute a sample from a normal 
distribution, the calculation above departs from strict 
rigour in accepting the estimate of error based on 
20 degrees of freedom, without making special 
allowance for the fact that this estimate is itself 
liable to sampling errors. 

We have treated the experiment above as though 
nothing were known of the treatments applied, or as 
though these were regarded merely as causes disturb- 
ing the yields with an unknown variance. Actually, 
it is kqpwn that the treatments D, E, F differ from 
A, B, C in including an additional nitrogenous 
dressing, while A, B, C and, in like manner, D, E, F 
differ amcJng themselves in receiving respectively 
o, 1, 2 units of a phosphatic dressing. The 5 degrees 
of freedom ascribed to treatments are, therefore, not 
/plausibly to be considered as homogeneous among 
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themselves, but may properly be subdivided, as we 
have seen in previous chapters, into unitary elements 
of very different agricultural importance. 

The total yields of the six treatments are set out 
below in relation to the manurial treatment received. 


TABLE 32 



No 

Nitrogen. 

Nitrogen. 

Total. 

Difference. 

No phosphate . 

Single phosphate 
Double phosphate . 

Total . 

2070 

2559 

2867 

7496 

2431 

3121 

3611 

9163 

i 

45 °t 

5680 

6478 

16,659 

361 

562 

744 

1667 


The eighteen plots receiving the nitrogenous dressing 
exceed the remaining eighteen plots in yield by 1667, 
so that this degree of freedom, N, contributes 1 6672/36 
or 77,191. t° th e tota l 2 48 ,i8o ascribed to treat- 
ments. The second degree of freedom of primary 
importance, P 1( is found by subtracting the yield of 
the twelve plots without phosphate from that of the 
twelve plots with double phosphate. The difference 
is 1977 so that the contribution of this degree of 
freedom is i 9 7 7 2 / 24 , or 162,855- This is the primary 
effect of phosphate. The second degree of freedom of 
this ingredient, P„ may be found by observing that 
the increment in the yield of twelve plots, due to 
a single application, is 1179, while the* additional 
increment due to the second application is 798. The 
difference is 381, and represents the excess of twice^ 
the yields for a single application over those for o, 
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or 2 units. The contribution of this degree of 
freedom is therefore 3812/72, or 2016. Similarly, 
using the differences between the yields with and 
without nitrogen, instead of the sums, we find for 
NPi, 3832/24 = 6112 and for NP 2 , 192/72 = 5. 

We are now at liberty to discuss the significance 
of each degree of freedom severally and, because 
the experiment is a well-designed one, we shall find 
that to each corresponds a system of appropriate 
hypotheses relevant to the aims of the experiment. 
Thus, the 1 degree of freedom due to nitrogen has 
a mean square 50 ‘56 times as great as that due to 
error. The value of “ Student’s ” t is therefore about 
7- 1 10. The 5 per cent, value of t for 20 degrees of 
freedom is 2-086, so that at this level of significance 
we may exclude all hypotheses ascribing to the 
nitrogenous dressing less than 5 -024/7 -no, or more 
than 9-196/7-110 of the apparent benefit observed. 
This conclusion refers directly to the manurial 
comparison on which it is based, and is entirely 
independent of the other conclusions to be drawn 
from the experiment as to the effects of phosphate or 
of its interactions with nitrogen. Naturally, also, if 
there were no such interactions, the conclusion would 
be applicable at levels of phosphatic manuring other 
than those used in the experiment. The data here 
indicate that the return from nitrogen would be 
definitely higher with higher phosphatic dressings 
than those used. The inference as to the return 
with the actual phosphatic dressings used is, however, 
"direct and independent of the interaction. 
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Each of the other elements into which the effects 
of treatments have been analysed may be treated 
independently, as we have treated N. A glance at 
the items of the expanded analysis of variance will 
show on which of these decisive evidence has been 
obtained. 

TABLE 33 


Component of Degrees of 

Treatment. Freedom. 

N i 

Pi - • ' • 1 

P. 1 

NP, 1 

NP, . • • ■ 1 


Total ... 5 


Mean 

Square. 

77,191 

162,855 

2,016 

6,112 

5 


248,179 


Error ... 20 i, 5 2 7 


Thus the primary effect of phosphate, like that of 
nitrogen, is demonstrated with unquestioned signifi- 
cance, and the magnitude of the return evaluated with 
fair accuracy. On the other hand, the contribution 
of the component NP 2 is much less than might have 
appeared as the result of random errors. The results 
are entirely compatible with the theoretical possibility 
that the response to nitrogen at different levels of 
phosphatic application changes in strict proportion to 
the amount of phosphate applied. The two remaining 
items, for P 2 and NP^ are intermediate in- magni- 
tude, indicating that the evidence of the experiment 
on two corresponding modes of varying the null 
hypothesis is of an intermediate character. For P 2 
the contribution, 2016, is statistically insignificant. 
The experiment does not prove that the additional 
response to the second dose of phosphatic manure is 
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certainly’ less than that to the first. It is in fact less, 
in accordance with common agricultural experience, 
but the experiment could not suffice by itself to 
demonstrate the reality of this decrease. If we 
consider a series of hypotheses with different values 
for the diminishing return, and determine which of 
these values are compatible, at any given level of 
significance, with the observed yields, some of the 
values which would appear to be acceptable would 
be negative, i.e. would represent increasing returns, 
though in the greater part of the acceptable range 
positive values would prevail. Even if the test of 
significance were chosen, so as to determine not both 
limits, but the lower level only, this lower limit would 
be found to be negative. For t about 1-15, the 
fiducial probability of “ increasing return ” is about 
13 per cent. 

In the case of NP 1( which measures the extent to 
which the response to nitrogen is increased by an 
increased phosphatic dressing, the state of the evidence 
is somewhat different. The value of “ Student’s ” t 
is 2-0004, while the value which is exceeded either 
positively or negatively in 5 per cent, of trials is 
2-068. If, therefore, the experimenter had no more 
reason to expect an increasing than a decreasing 
response the observed value would have fallen just 
short of 5 per cent, significance. Since, however, 
normal agricultural experience would lead us to 
anticipate an increase, while a decrease would be 
somewhat anomalous, this test tells us not only that 
the observed magnitude of the effect is nearly 
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significant, but also that it is in the right direction. 
These two independent pieces of evidence are com- 
bined by choosing a test, which determines a lower 
fiducial limit only, i.e. by taking for comparison the 
value of t, 1-725, tabulated as' corresponding to the 
probability 10 per cent. This test shows that, at the 
5 per cent, level of significance, any hypothesis which 
gives to the increase, in response to nitrogen, a 
negative value, is contradicted by the experimental 
results, or, in other words, that a positive effect is 
demonstrated, at this level of significance, by the 
experiment. As in other cases, where an effect is 
little more than barely significant, the precision with 
which its value is estimated is, of course, extremely 
low. 

It would have been legitimate to choose other 
comparisons among the treatments employed, and to 
make with them other tests of significance. We 
might, for example, have compared the plots receiving 
double, directly with those receiving single phos- 
phate, and have discussed the significance of this 
difference, in isolation ff-om the other experimental 
results. The only inconvenience of such a course is 
that if, as is usually the case, the result is to be used 
in the examination of scientific theory, in the framing 
of practical advice, or in the designing of future 
experiments, in conjunction with facts of the same 
kind as the remainder of the experiment provides, it 
is clearly preferable that the whole of these should be 
recognised by means of a series of independent tests, 
each having some agricultural relevance. Although, 
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a series of tests can always be chosen, independent 
of the one with which we may start, the supplementary 
information provided by them will often be of too 
complicated a kind for its bearing on our effective 
conclusions to be readily appreciated. Consequently, 
it will usually be preferable, as in the example chosen, 
to design the experiment so as to lead uniquely to a 
single series of tests chosen in advance. 

Where a number of independent tests of signifi- 
cance have been made, on data from the same 
experiment, each test allowing of the rejection of 
the true hypotheses in 5 per cent, of trials, it follows 
that a hypothesis specifying all the differences in 
yield between the treatments tested will, although 
true, be rejected with a higher frequency. If, there- 
fore, it were desired to examine the possible variations 
of any hypothesis which specified all these differences 
simultaneously while maintaining the 5 per cent, level 
of significance, a different procedure should be 
adopted. Actually, in biology or in agriculture, it 
is seldom that the hypothetical background is so 
fully elaborated that this is necessary. It is, therefore, 
usually preferable to consider the experiment, as we 
have done above, as throwing light upon a number 
of thepretically independent questions, d here is, 
however, no difficulty, when required, in making a 
comprehensive test on all questions simultaneously, 
as in the Z test first employed, or in extending this 
test so as to specify the aggregate of compound 
hypotheses which are contradicted by the experiment 
>at any assigned level of significance. 


o 
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In analysing the 5 degrees of freedom ascribable 
to treatments in the 6x6 Latin square, certain 
differences were obtained from the experimental 
yields, such as the 1667 units of yield by which the 
plots receiving nitrogen exceeded the remainder. 
Any hypothesis respecting the difference in yield of 
the six treatments used, may be specified by the 
hypothetical values which it gives corresponding to 
these observed differences. Thus if were the 
hypothetical value corresponding to 1667, corre- 
sponding to 19 77, and so on, the sum of squares 
for the 5 degrees of freedom representing the devia- 
tions of the observed responses to treatments from 
those predicted by hypothesis would be 

(1667-aO 2 , (1977-Aa) 2 1 (3 Sl — 

36 h 24 72 

, (383 — a 4) 2 , (»9— a s) 2 
"^24 72 

We may now find how large this expression mu5t 
be in order that z should be equal to its 1 per cent, 
value. This value as given by the table is 0-7058. 
Adding to this \ log, for error, -2117, we have -9175 
corresponding to a mean square 6265, or to a sum of 
squares 31,325. The 1 per cent, test of significance 
for a hypothesis specifying all the values a 1( a- 2 , . . ., 
a fi will, therefore, reject any hypothesis for which 
the quadratic expression set out above exceeds 3 L3^5 
and will accept all hypotheses for which it has a 
lower value. 
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65. Comparisons with Interactions 

The last class of variation to be considered in 
the tests of significance derivable from the analysis 
of variance, consists of cases in which we compare 
primary effects with interactions, or interactions with 
interactions of a higher order. If, for example, a 
test were carried out of five varieties of an agricultural 
plant, using a Latin square laid down at each ot ten 
representative farms, in a region to which the five 
varieties tested have all some claim to be thought 
appropriate, the experiment at each farm will provide 
an analysis of the form : 

TABLE 34 


Degrees of 
Freedom. 

Rows 4 

Columns .... 4 

Treatments ... 4 

Error. - • ■ 1 2 

Total 2 4 


If we have corresponding data for each of ten places 
the whole series will yield together 40 degrees of 
freedom for rows and 40 degrees for columns, all of 
which represent components of heterogeneity which 
have been eliminated. There will also be 120 degrees 
of freedom in all for error. But the remaining 40, 
composed of the components ascribed to variety at 
each of the ten places is divisible into 4 degrees of 
freedom for variety V, and 36 for interaction between 
variety and place VP. There would, of course, also 
be 9 degrees of freedom, representing the contrast 
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between places, but with these we are not concerned. 
The complete analysis of such a record of 250 yields 
would, therefore, be as follows : 

TABLE 35 


Degrees of 
Freedom. 

Rows • 4° 

Columns . - • .4° 

Places 9 

Varieties 4 

V X P 36 

Error. . • .120 

Total . . .249 


It would be proper, of course, to examine the 
record from each farm for significant differences 
between the varieties, for even if these were not 
concordant they might indicate a greater aptitude 
of some varieties compared with others to the soil 
conditions of a particular site. Even in the absence 
of significant differences on individual farms, the 
results of the different experiments might be sufficiently 
concordant to give a significant comparison in the 
analysis of the entire experiment between varieties 
and error. This might not, however, be the most 
appropriate comparison to make, for since the varieties 
might react differently to different types of soil, it is 
not improbable that the mean square corresponding 
to the 36 degrees of freedom VP, is greater than the 
mean square due to error. If the precision of the 
individual experiments were high, the difference 
between the aggregate yields of two varieties might 
be significant compared with error, although one was' 



INTERACTIONS AS ERROR 213 

the better at only six places, while the other was 
better at the remaining four. In fact, if our concern 
is to ascertain not merely the best variety on the 
aggregate of the ten fields actually used, but to 
ascertain which is the best over the whole area 
deemed suitable for this type of crop, within the 
region from which the sites of the experiment have 
been selected, the comparison between varieties V 
and interaction of varieties and places VP will be 
the more appropriate. For, if the ten sites have 
been chosen at random from this area, a significant 
difference in this comparison would indicate, at the 
level of significance used, varietal differences appli- 
cable to the whole area. The precision of this 
comparison may not be greatly increased by higher 
precision in the individual experiments, especially if 
the mean square corresponding to VP is consider- 
ably greater than that ascribable to experimental 
enror. To increase its precision we may rather 
require an increase in the number of sites used, or 
in other words, if the area sampled is considerably 
heterogeneous with respect to varietal response, it 
may be necessary to sample it more thoroughly. 1 he 
hypothetical population with which we are principally 
conceraed, will then be the population of possible 
sites available for growing the crop under considera- 
tion, rather than the population of possible yields of 
plots withtn a given site. The test employed is, in 
fact, equivalent to considering, from each farm, only 
the aggregate yield of each variety, and the estimation 
of error within each individual Latin square is of 
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value, apart from the local information supplied, only 
in providing assurance that the experimentation has 
been carried out with an exactitude sufficient to 
guarantee the adequacy of the comparisons between 
different places. 

Cases in which it is one of the higher order 
interactions, rather than error proper, that should 
appropriately be used as a basis for tests of 
significance, are relatively numerous. The data of 
Table 7 a (p. 76) are of this kind. Agricultural 
experiments, whether with manures, implements of 
cultivation or varieties of crop plants, are much 
affected by the weather. If a treatment effect is 
significant, compared with error in any one year, the 
experiment will have indicated what treatments have 
in that year proved most advantageous. But, if 
independent experiments over a series of years show 
a significant difference between treatments on the one 
hand, and the interaction between treatments and 
years on the other, the experiment has shown what 
treatments are the most successful in an aggregate 
of seasons, of which those experienced may be taken 
as a random sample. There seems, in fact, in no 
part of the world to be any such similarity between 
successive seasons as would make the experience of 
a sequence of trials unreliable for future application 
in the absence of genuine secular changes of the 
climate. 

The Same principle is of wide application in 
economic and sociological enquiries where, in com- 
parisons of rates of death, morbidity, births, prices 
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and so on, the effective unit is far more often a 
district, or a town, than an individual. The supposi- 
tion that rates, based on the registration of individuals, 
possess the precision w'hich would be appropriate if 
all the individuals concerned could be regarded as 
independent in their sociological reactions, is clearly 
inappropriate when we are interested in the effects on 
these reactions of economic or legislative causes, or 
other agencies derived from social organisation, liable 
to affect large numbers of individuals in a similar 
manner. The effective samples available for ad- 
ministrative decisions, even though based ultimately 
on millions of individual persons, are often much 
smaller than those available in biological experimenta- 
tion and, for this reason, require even more than the 
latter, the accurate methods of analysis by which small 
samples may be interpreted. 
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THE MEASUREMENT OF AMOUNT OF 
INFORMATION IN GENERAL 

66. Estimation in General 

The situations we shall now examine are of a more 
general character than those considered in the classical 
theory of errors, which have been dealt with in 
previous chapters. It has been seen in the last chapter 
that we may be interested to interpret the data as 
arising, subject to errors of unknown magnitude, but 
distributed normally, from one or more unknown 
quantities, parameters, of which we are interested 
to form estimates, of known precision, and to make 
this precision as great as possible. In the most 
general situation of this kind, all the different kinds 
of individual events which it is possible to observe, 
are regarded as occurring with frequencies function- 
ally dependent in any way on one or more of such 
unknown parameters. This is the general situation 
considered in the Theory of Estimation. From the 
purely statistical standpoint, they present the pjroblem 
of how best the observations can be combined, in 
order to afford the most precise estimates possible of 
the unknowns. The mathematical principles of this 
process of combination are now satisfactorily under- 
stood, and have been illustrated in detail in the ninth 
chapter of the author’s book. Statistical Methods for 

az6 
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Research Workers. From the point of view of the 
practical design of experiments, or of observational 
programmes, we shall here be concerned only 
indirectly with the technique of the calculation of 
efficient estimates, and can turn attention at once to 
the problem of assessing, in any particular case which 
arises, the quantity of information which the data 
supply, and which we may assume will be efficiently 
utilised. 

The reason for this standpoint, so contrary to that 
traditional among statisticians, deserves some explana- 
tion. During the period in which highly inefficient 
methods of estimation were commonly employed, and, 
indeed, strongly advocated by the most influential 
authorities, it was natural that a great deal of ingenuity 
should be devoted, irt each type of problem as it arose, 
to the invention of methods of estimation, with the 
idea always latent, though seldom clearly expressed, 
of making these as accurate as possible. The attain- 
ment of a result of high accuracy was, in fact, evidence 
not only of the intrinsic value of the data examined, 
but also to some extent of the skill with which it had 
been treated. The extent to which this was so was 
the greater the more inefficient were the methods 
ordinarily recommended ; but, clearly, in any subject 
in which the statistical methods ordinarily employed 
leave little to be desired, the precision of the result 
obtained Avill depend almost entirely on the value of 
the data on which it is based, and it is useless to 
commend the statistician, if this is great, or to reproach 
him if it is small. At the present time any novice in 
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the theory of estimation should be able to set out the 
calculations necessary for making estimates, almost, 
if not quite, as good as they can possibly be. Any 
improvement which can be made by further .refine- 
ments of computational technique are, in ordinary 
cases, and setting gross incompetence aside, exceed- 
ingly small compared to the improvements which may 
be effected in the observational data. 

The amount of information to be expected in 
respect of any unknown parameters, from a given 
number of observations of independent objects or 
events, the frequencies of which depend on that 
parameter, may be obtained by a simple application 
of the differential calculus. It may be worth while 
to consider a few easy examples in detail, in order to 
obtain a clear grasp of the process generally involved. 

G7. Frequencies of Two Alternatives 

Let us suppose that only two kinds of objects or 
events are to be distinguished, and that we are con- 
cerned to estimate the frequency, p, with which one 
of them occurs as a fraction of all occurrences ; or, 
what comes to the same thing, the complementary 
frequency, q (= i -p) with which the alternative 
event occurs. We might, for example, be estimating 
the proportion of males in the aggregate of live births, 
or the proportion of sterile samples drawn from a bulk 
in which an unknown number of organisms are 
distributed, or the proportion of experimental animals 
which die under well-defined experimental conditions. 
The experimental or observational record will then* 



GENERAL PROCEDURE 


219 


give us the numbers of the two kinds of observa- 
tions made, a of one kind and b of another, out of 
a total number of n cases examined. We wish to 
know how much information the examination of n 
cases may be expected to provide, concerning the 
values of p and q, which are to be estimated from 
the data. 

A general procedure, which may be easily applied 
to many cases, is to set down the frequencies to be 
expected in each of the distinguishable classes in terms 
of the unknown parameter. For each class we then 
find* the differential coefficient, with respect to p, 
of this expectation. The squares of these, divided 
by the corresponding expectations, and added 
together, supply the amount of information to be 
anticipated from the observational record. That 
such a calculation will give a quantity of the kind 
we want, may be perceived at once by considering 
that the differential coefficients of the expectations, 
with respect to p, measure the rates at which these 
expectations will commence to be altered if p is 
gradually varied ; and the greater these rates are, 
whether the expectations are increased or diminished 
as p is increased, or in other words, whether the 
differential coefficients are positive or negative, the 
more sensitively will the expectations respond to 
variations of p. Consequently, it might have been 
anticipated that the value of the observational record 
for our purpose would be simply related to the squares 
of these differential coefficients. 

We may now set out the process of calculation for 
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the simple case of the estimation of the frequency of 
one of two classes. 

TABLE 36 


Observed 

Frequency. 

<*) 

Expected 

Frequency. 

(«) 

Differential 

Coefficient. 

dm{dp 

• 

I fdm\ 2 
m\dp) 

— 

a 

b 

pn 

gn 

n 

— n 

>‘!p 

njq 


n 

ft 

0 

njpq 



The frequencies expected are found by multiplying 
the number of observations, n, by the theoretical 
frequency, p, which is the object of estimation, and 
by its complementary frequency, q. The differential 
coefficients of these expectations with respect to p are 
simply n and — n. The sum of these is zero, as must 
be the case, whenever, as is usual, the number of 
observations made is independent of the parameter 
to be estimated. It is obviously, therefore, not the 
total of the differential coefficients which measures 
the value of the data, but effectively the extent to 
which these differ in the different distinguishable 
classes, as measured by their squares appropi iately 
weighted, as shown in the last column. 

The total amount of information is found to be 


and we may now note the well-known fact that, if 
our sample of observations were indefinitely increased, 
the estimate of p , obtained from the data, tends in the 
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limit to be distributed normally about the true value 

with variance The general method here given of 

n 

measuring quantity of information thus agrees with 
the concept, which has been formed of this quantity 
in previous chapters, where we were concerned only 
with normally distributed errors. 

68. Functional Relationships among Parameters 

It is often true that the frequency of a particular 
event among different events of a like kind is itself 
an abject of enquiry, as is the case, for example, with 
the sex-ratio of births. More often the frequency 
is itself only of value because it is believed to be 
functionally related to some other quantity of more 
direct importance. The frequency, p, of sterile 
samples from a vessel containing an unknown density 
of organisms is related to the average number, m, of 
organisms in the sampling unit, by the relation, 

p ~ e~ m , 
m — —log p, 

where the logarithm is taken from a table prepared on 
the natural or Napierian system. 

J f now the object of making a count discriminating 
only tlie two types of sample, viz., sterile samples 
which contain no organism, and fertile samples which 
contain at least one, is to make an estimate of the 
density irf the material sampled, or in the material 
from which the dilution sampled was prepared, we 
shall be interested, not directly in the amount of 
information about p, but rather in the amount of 
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information about m, which the sample provides. 
Since m and p are functionally related, this can be 
obtained by using the relationship directly, from the 

amount of information about p. 

If in the table set out above, showing the calcula- 
tion of the amount of information respecting p supplied 
by n observations, we had differentiated with respect 
to m, instead of in respect to p , the process would have 
led to the amount of information with respect to m. 
The component terms of this calculation would each 
have differed from those we obtained only in 
containing, as an additional factor, the square of the 
differential coefficient of p with respect to m. In 
general, if \ m stands for the amount of information 
with respect to m, and I, for the amount of informa- 
tion with respect to p, we have the transformation 
formula 

i - f^Vi 


In the present case 
whence 

And since 

it follows that 


P = 


dp 

dm 


= P- 


T = 


pq 


i_ = — 


np 


As in the case of p, if the number of samples 
examined is increased, our estimates of m derived 
from a given number of samples, tend to be normally. 
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distributed about the true value, and the variance of 
this limiting distribution is given by the reciprocal 
of the amount of information, 


V(,«) = * = 1 

v ' pn n 

The errors of estimation are least when m is near to 
zero, and increase rapidly if m is made large. This, 
however, does not mean that the determination will 
be most accurately carried out .with very high dilu- 
tions, or with very small sampling units, by which 
meags m may be made as small as we please ; for it 
must be remembered that if we reduce the sampling 
errors of m by making m smaller, we will not 
necessarily diminish the relative magnitude of these 
errors when compared with m. I o minimise the 
relative magnitude of the sampling errors, we need 
to consider the variance of m divided by m 2 ; or 

1 1 e m — 1 

—VO) = - — 2 . 
w z n 


This quantity tends to infinite values when m is made 
either very small or very large. Since the logarithm 
of m possesses the property that d\dm log m = ijfn, 
it appears equally that the amount of information 
supplied by the experiment relative to log m is given 
by the expression 2 


This quantity vanishes as m tends to zero or infinity, 
but is finite at all intermediate values ; and the relative 
precision of our estimate of the number of organisms 
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will be greatest if the dilution or the sampling unit 
were adjusted, so as to maximise this quantity. 

Fig. 3 shows the quantity of information, for all 
values of m, for which this quantity is not very, small. 
The horizontal scale is logarithmic, so that values of 
m indicated at equal intervals are in geometric 



The absolute maximum of information is given when 
m is about i-6, or, more precisely, a number th with 
rather remarkable properties, such that 

i — e~” = 

since 

P = 

it follows that the ratio of p to ^ is the same as, 
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the ratio of (2 — in) to ‘fit. Numerically, it appears 

that *«= 1-593,624,26 

\—e~* = -796,812,13 = \ih = q 

2 — ?h = -406,375,74 

e~ M = -203,187,87=1 — 1 rii = p 

The ideal proportion of sterile samples for estimat- 
ing by this method the density of the organisms 
is, therefore, just over 20 per cent. Any proportion 
between 10 per cent, and 33 per cent, sterile will, how- 
ever, supply nearly as much information, and the aim 
in adjusting the sampling process should be to obtain 
a percentage of sterile samples between these limits. 
The maximum amount of information per sample is 

i — -647,691,54 = ih(2 — rii) = 4 M 

To find the minimum number of samples needed to 
estimate m with any given precision, we may now 
equate ni to the invariance of log m required. 1 hus 
if we required to reduce the standard error of in to 
xo per cent, of its value, we might put the standard 
error of log m equal to o-i ; the variance of log m 
would then be o-oi and its invariance would be 100. 
We should then have 

ni = ioo , 

or n = 154-4. 

Even in the most favourable circumstances, therefore, 
it would need 155 samples to reduce the standard 
error beltJW 10 per cent, of the estimated density. 
Since, owing to our ignorance of the true density, the 
dilution cannot be adjusted exactly so as to give the 
ideal proportion of sterile samples, it would usually be 
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wise to divide the amount of information required by 
a smaller divisor than the maximum value -6477, 
eg. by -6, which would raise our estimated require- 
ment to 167 samples. The whole calculation- shows 
that the method of estimating the density of organisms 
by discriminating only their presence, or absence, m 
samples is of low precision, compared with methods 
in which individuals or colonies may be counted. 

In many types of research a series of dilutions is 
employed, giving densities falling off in geometric 
progression, with a constant factor most commonly 
of 2 or 10. The amounts of information supplied by 
each of these is represented in the diagram (Fig. 3) 
by the heights of a series of equally spaced ordinates. 
If the series is extended so as to cover all densities 
which supply an appreciable amount of information, 
the sum of the ordinates for two-fold dilution is nearly 
constant in value and has an average value 


This is, therefore, the amount of information supplied 
by a single sample at each dilution, and this may be 
used to calculate the precision to be expected, using 
any number of samples at each dilution, or to calculate 
the number of samples required to attain any 
stipulated level of precision. For four-fold dilutions 
the average amount of information supplied is, of 
course, a half, and for eight-fold dilutions one-third 
of the number found above. For ten-fold dilutions 
it is about three-tenths, but for the higher dilution 
ratios the sum of the ordinates shows a rapidly- 
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increasing variation, with the consequence that the 
amount of information actually obtained becomes 
less reliable, the larger the dilution-ratio employed. 

69. The Frequency Ratio in Biological Assay 

Use is often made of a frequency ratio between 
two distinguishable classes, to supply a measure of 
an underlying variate, as when the toxic content of 
a drug is inferred from the mortality of experimental 



animals receiving a known dosage, or from the 
dosage required to cause a given mortality. The 
underlying theory is illustrated in Fig. 4- rhe curve 
represents a normal distribution with unit standard 
deviation, divided by the ordinate PM into two 
portions. The area to the left of the ordinate 
represents the proportion, p, which die under the 
treatment; the area to the right the proportion, q, 
which survive. The height of the ordinate is 
represented by *, and its distance from the central 
axis of the curve by *, taken positive to the right 
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of the axis. As x increases from — co to co , the 
proportion dying increases from o to I. Knowing 
any of the three quantities *, p or z, the other two 
can be obtained from available tables. An experi- 
mental determination of the fraction, p, will therefore 
supply a corresponding determination of the deviation, 
and this is found, in a large number of cases, to 
increase or decrease proportionally with the logarithm 
of the toxic content of the dose. If, therefore, the 
relationship between dosage and mortality has been 
established for a standard preparation, the toxicity of 
any material under test may be gauged by observing 
the mortality which supervenes on a known dosage 
of the material to be tested. Moreover, so long as 
a linear relation holds between the toxic content 
measured logarithmically and the deviation, x, the 
precision of the assay will be proportional to the 
precision with which x is estimated. 

As is seen from the figure, if x is increased by >a 
small quantity dr, the initial increase of p is zdx. 

Hence dp _ 

dx 

But we know that the amount of information with 
respect to :r is given by the equation 



» 2 2 

Tq' 


where n is the number of animals employed. Although 

L is least when p equals q, or at 50 per cent, mortality, 
pq 
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the quantity of information, , is greatest at this 

point. Hence for a single test, the highest precision 
is obtained for a given number of animals by adjusting 
the dosage approximately to the 50 per cent, death 
point. The quantity n2 i 


PQ 

is used further as the weight to be assigned to the 



estimated value of x when a number of tests at 
different dosages are to be combined. The quantity 
5 + *, known as the “probit value,” is used as a 
practical •measure of mortality, and Dr C. I. Bliss 
has given tables of the weighting factor, and other 
relationships needed for the more complex problems 
which arise in toxicological research, big - 5 shows 
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the amount of information respecting the probit value 
supplied by each animal observed, for different 
percentage mortalities. It will be seen that when 
the mortality is between one-third and two-thirds, 
the information gained falls little short of the highest 
possible. 

70. Linkage Values inferred from Frequency Ratios 

When an organism receives from its two parents 
corresponding genes of different kinds, it generally 
hands on one kind to half its offspring and the other 
kind to the remainder. The numbers of the two kinds 
of offspring observed may, however, differ, either by 
chance or owing to the unequal viability of these 
two kinds. The parent is said to be heterozygous for 
the Mendelian factor in question. If the parent is 
heterozygous for two different factors, which are not 
linked in inheritance, he may make contributions of 
four different kinds to the germinal constitution of 
the offspring, and these will occur in equal numbers. 
If, however, the factors are linked, or carried in the 
germ-plasm by the same chromosome, the two gene- 
combinations received by the heterozygote from his 
parents will be handed on to the offspring more 
frequently than the remaining two combinations 
formed by interchanging the pairs of genes. The 
intensity of the linkage is measured, in an inverse 
sense, by the frequency among all the offspring, of 
those receiving recombinations. When the recom- 
bination frequency is small, the linkage is close ; 
when it is large, approaching 50 per cent., the linkage 
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is loose. The type of mating which best tests the 
intensity of linkage is one with an organism distin- 
guishable from the heterozygote in respect of both 
factors, i.e. when there is dominance, with a double 
recessive. 

If there is no difference in mortality among the 
four distinguishable types of offspring, up to the time 
at which they can be recorded, any such mating will 
determine the linkage value, with precision limited 
only by the number of offspring. Thus, if a fraction, 
p, of recombinations is estimated from a count of n 
offspring, the amount of information available as to 
the value of p is n 

pq 

If, however, the two types counted as recombinations 
have an average viability different from that of the 
two types of parental combinations, the linkage value 
so estimated will be distorted by the differential 
mortality. It is possible to overcome this difficulty 
by making up heterozygotes of the two kinds possible, 
so that the recombinations from one set of matings 
are genetically similar to the parental combinations 
from the other. Thus, if in one set the apparent 
recombination value has been raised by differential 
viability, it will have been lowered in the other set. 
If, therefore, we have a record of two such sets of 
matings as shown in the following table 

Recom- Parental Total 

binations. Combinations. 

1st set - • a i ” l 

2nd set . • a 2 
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we may argue, that the ratio a 2 /t> 2 has been raised 
(or lowered) in the first set, in the same proportion as 
the ratio <z 2 /3 2 has been lowered (or raised) m the 
second. Hence, if we take the geometric mean of 
these two ratios, and use, as our equation of estimation, 

P _ if? 

q v Va’ 

we shall obtain an estimate unbiased by differential 
mortality, in so far as it is caused by the factors studied. 

To determine the precision of such an estimate, 
we may consider first the precision with which -the 

qUantity logging, -log* 

9 

is derived from a simple frequency ratio a : b ; since 

d . p x , i _ 1 
^ l0g * = /> + 

it follows that 

Hog,/, =/Vh» =P'9'j q = *P9\ 

and since, as our estimate, we shall put. 


the amount of information may be written, 


or, in large samples, the sampling variance of 

' 0g » " *4*-4 -1. 

ob n b 
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Now, in estimating log P - from the geometric mean 
of two observed ratios, we are taking half the sum of 


two estimates of log P -, and the sampling variance will 
therefore be one-quarter of the sum of the four 
reciprocals. 


4x^1 "1 a * t> ' 1 ' 


Hence, 


- L + v 

a 1 b , 


‘+ 1 


- //, 


b. 


where h is the harmonic mean of the four frequencies 
observed. 

The information respecting the recombination 
fraction, p , estimated in this way may be calculated, 

as before, from that respecting log and is evidently 

I P = hp*q*. 


We may now ask in what proportions the two 
types of mating should be used, in order to secure the 
greatest precision for a given number of organisms 
bred and examined. 

If Pi< $T stand for the proportions observed from 
matings of the first kind, and p a , from matings of 
the second kind, the amount of information has been 
shown to be inversely proportional to 


or to 

We wish to 


— — | — - — | — 4 — — , 
n x p x n \1\ n *Pi 


1 


+ 


n iP\ ( h n iPi l h 

ake this quantity as small as possible, 
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consistently with a fixed total of organisms observed, 

#i + **a* If the numbers are such as to make this 
quantity a minimum, it will be unaltered by a small 
decrement in the number n u accompanied *by a 
corresponding small increment in the number n 2 . 
But if n\ is diminished and « 2 increased by a small 
change dn , the quantity above is increased by 

dn dn 

n \P\q\ 

Hence for the most satisfactory proportion, 

«i Vi?i = n 2 2 Ptf*- 

We should then endeavour to adjust our observational 
numbers so that ■ 

in other words, the product of the observed frequencies 
from one set of matings should be approximately equal 

to the product from the other set. 

By reversing the manner in which the frequencies 
are combined, the data may be used to estimate 
differential viability, in the same way as they are 
used to estimate the recombination frequency. The 
consequence is that the same proportionate numbers 
from the two types of mating, which are ideal for the 
estimation of linkage, are also ideal for the estimation 
of differential viability. 

71. Linkage Values inferred from the Progeny of 
Self-fertilised or Intercrossed Heterozygfctes 

With many plants it is easier to ensure self- 
fertilisation than to execute controlled crossings, 
consequently, much of the information available as to 



SELFED HETEROZYGOTES 25s 

linkage in plants is derived from the tamilies ot self- 
fertilised heterozygotes. With animals also such data 
are obtained in the course of combining two recessives 
not y«t available in combination. Methods of estimat- 
ing the intensity of linkage have been examined 
in Chapter IX. of the author s Stutisticul Alethods 
and analogous, but more complex cases have been 
discussed by J. B. Hutchinson and T. R. I miner. 
We are here only concerned with the evaluation in 
such problems, of the quantity of information as to 
the linkage value postulated, which the data make 
available. If there is reason to suspect differential 
viability, there is no satisfactory substitute for back- 
crossing, so we shall discuss only the case in which 
this complication is absent. 

The frequencies of the four distinguishable types 
to be expected may best be inferred from that of the 
double recessives, for this is only produced when both 
qf the uniting gametes lack both dominant genes. 
When the two dominant genes have been received by 
the parents from different grandparents (repulsion), 
the proportion of such doubly recessive gametes will 
be £ p, where p is the recombination fraction. 1 he 
probability that both the uniting gametes are of this 
kind & therefore £ p z , or J p p' if the recombination 
fractions should be different in male and female 
gametogenesis, and are represented by p and p ' . We 
may, therefore, represent this fraction by £ 0 noting 
that, for repulsion V 0 will be the recombination 
fraction, or at least the geometric mean of the two 
recombination fractions, if there are two different 
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values. The frequencies, in any case, are expressible 
in terms of 0. Consequently, it is only of this quantity 
that the data provide information. The data provide 
no means of detecting any difference that may exist 
between p and p\ and we shall from this point use 
the symbol x merely as an equivalent to V 6. In the 
case of coupling, on the other hand, the doubly 
recessive gametes will be of the parental combination, 
and the recombination fraction will be i — x- 

From the expected proportion of double recessives 
the proportions of the other classes may be easily 
inferred from the fact that each recessive separately 
must appear in one - quarter of the offpsnng,. 
irrespective of linkage. The two singly recessive 
genotypes have, therefore, each a proportional 
expectation of i (i-0), leaving £ (2+0) for the last, 

or doubly dominant type. 

Having evaluated the expectations we may now, 
as before, calculate directly the amount of information 
which a record of n offspring will supply as to the 
value of 0. The table below shows this calculation. 


TABLE 37 


Offspring expected. 
(”>)• 

dtnfdd. 

i (dm\ % 
in \Jd) 

Total. 

* 9 

n 

«/40 


4 

- (i—0) 

4 

n 

4 

«/4(i-0) 

2 «(I -f 20 ) 

4 

n 

»/4(i — 0) 

40 ( 1 — 6 ) (2 + 6 ) 

- (i- 0 ) 

4 

~~ 4 

- U+ 0 ) 

4 

n 

«/4(2 + 0) 

. 
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from which it appears that 

_ «(l+20) 

“ 20(1-0) (2 + 0) 

for all values of 0 . It will be noted that the second 
and third classes of offspring, the expectations of 
which are the same functions of 0, might have been 
treated together without altering the result. In fact 
we are only concerned with the total number in these 
two classes, and not with the parts of which this total 
is composed, in estimating the value of 0 . The fact 
that these two classes are usually distinguishable, 
adds nothing to our information. The same applies 
•wherever distinguishable classes have proportional 
frequencies. 

Knowing the information available respecting 0 , 
we can now obtain the quantity of information 
respecting x • Tor, since 

0 - X 2 - 

dO/dx — 2x, 

and 



Hence 

a- — 2 ( I + 20 ) 

t x - 4 VlO — (!_ 0 ) (2 + 0 )' 

This quantity rises steadily from the value unity 
when 0 = o, the closest possible linkage in repulsion, 
through *16/9 when 0 = £, linkage being absent, to 
an infinite value when 0 = 1, the limit of close linkage 
in coupling. When linkage is at all close, there- 
fore, interbreeding of heterozygotes in coupling is 
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immensely more informative, for the same number of 
offspring, than the interbreeding of heterozygotes in 
repulsion. Roughly speaking, with io per cent, 
recombination, coupling matings are worth about ten 
times as much as repulsion matings ; and if the 
recombination fraction is as small as 5 per cent., they 
are worth about twenty times as much. 

Lack of recognition of this great contrast between 
the amounts of information supplied by these two 
types of progenies, has led, on several occasions in 
the genetical literature, to curious misinterpretations 
of the genetical results. Indeed, it greatly delayed 
the discovery of the phenomenon of linkage itself, for < 
English geneticists, discussing undoubted cases of 
linkage in plants, while observing the occurrence of 
recombination among the coupling progenies, failed 
to recognise its occurrence in the progenies from 
heterozygotes in repulsion, and were led to believe 
that these two different aspects of the same problem 
followed different laws. The discovery of linkage was 
thus delayed until animal geneticists, working with 
a biparental organism, Drosophila, in which back- 
crossing is as convenient as the interbreeding of 
heterozygotes, demonstrated that the recombination 
fraction was the same, irrespective of whether the two 
dominant genes entered the cross from the same or 
from different parents. Had the plant geneticists 
been aware that a progeny of 200 offspring in repulsion 
might be equivalent, in evidential value, to some 
25 offspring in coupling, they would, perhaps, have 
grown sufficiently numerous repulsion progenies to 
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have demonstrated the identity of the two phenomena, 
which had attracted their attention. 

A number of further inferences of practical interest 

follow, from the evaluation of the amount of informa- 
tion to be derived from progenies by self-fertilisation, 
which the reader may usefully verify for himself. 

(1) With close linkage, progenies obtained by self- 
fertilising heterozygotes in coupling are of nearly 
equivalent value with back-cross progenies. I hus 
the advantage of back-crossing when it is possible, 
lies, in cases of close linkage, principally in the 
opportunity it affords of eliminating and of 
evaluating, differential viability, and of detecting 
any difference there may be in the recombination 
fraction in male and female gametogenes.s 

(2) When no double recessives are available, the 
only double heterozygotes that can be formed are 
in repulsion. Self-fertilising or interbreeding these 
supplies very little information when the linkage is 
close When, however, on growing such a progeny, 
this situation is found to have occurred, the plant 
geneticist has usually the choice of two alternative 
methods of adding to his information in the next 
generation, (a) He may repeat his previous procedure 
on a large scale, and ( 6 ) he may grow selfed progenies 
from the last generation, and so ascertain which are 
homozygous and which heterozygous, and among t e 
double hAerozygotes, which are in coupling and which 
in repulsion. Supposing the land and labour required 
to grow each such family to be equivalent to that ot 
growing 25 self-fertilised plants of the kind first 
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obtained, procedure (3) will be the more profitable 
when linkage is very close, and less profitable when 
it is looser. It is an instructive problem to ascertain 
at what linkage value the two methods are squally 
advantageous. In considering this problem it should 
be noted that in procedure (3) the geneticist may 
choose to form families from the singly recessive 
plants, or from the double dominants, or from both, 
but has clearly nothing to learn from the doubly 
recessive plants. The value of the second season’s 
work will lie, not in the total information gained by 
a complete classification, but only in information 
additional to what has been gained by the first season’s 
work. In the second season, however, there will be 
some further information, from the progenies of 
25 plants each from those self-fertilised plants which 
happen to be double heterozygotes, and of these a 
certain proportion must be expected to be in coupling. 

72. Information as to Linkage derived from 
Human Families 

The greatest obstacle to the study of linkage in 
man is that it is seldom possible to test or examine 
for known factors so many as three generations of a 
family showing any hereditary peculiarity. Conse- 
quently, when double heterozygotes are found among 
parents, it is not known, supposing there is linkage, 
whether they are in coupling or repulsiofi. Apart 
from recent race mixture, however, and other causes 
of disturbance, these two phases may be expected to 
occur in equal numbers and, indeed, this fact, when 
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true, can be verified from the family records of only 
two generations. The possibility of obtaining from 
such records indications of linkage, was first proposed 
by Bernstein by the use of methods, however, which 
do not in general utilise the whole of the information 
in the record. The problem has since been more fully 
discussed by Haldane and others. We shall here only 
illustrate the general method of assessing the amount 
of information obtainable by a classification of the 
different kinds of families in the record. 

Many rare anomalies are transmitted from 
generation to generation by persons heterozygous 
for the mutants responsible. If these and their 
"Spouses and children are examined for some known 
factor, such as the capacity for tasting phenyl- 
thiocarbamide, a certain number will be heterozygous 
tasters. Since homozygous tasters cannot be dis- 
criminated from heterozygotes, this will only be 
known if the affected parent is a taster, and if at least 
one of the children is a non-taster. Only such 
families can, therefore, be included in the record. 
Apart from the classification of the children, such 
families are of two kinds, ( a ) in which the normal 
parent is a non-taster, for which Bernstein’s method 
is satisfactory ; and (b) in which the normal parent is a 
heterozygous taster, for which it is less successful, 
and which we may take as an example. 

The families of two in such a record are of seven 
possible kinds, which are shown in Table 38 below, 
where distinguishable individuals are denoted as 
follows: the affected A, normals a, tasters T, non- 

Q 
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tasters /. The first six kinds of family are arranged 
in the table in pairs, each member of which has the 
same frequency for heterozygotes in coupling as the 
other has for repulsion. The combined frequency of 
these two kinds of family is thus independent of the 
relative frequency of these two kinds of heterozygotes, 
while the equality of frequency of members of the 
same pair will serve to confirm the view that the 
two types of heterozygote are equally frequent, or, 
if this were not so, to estimate their relative frequency. 
We shall here be concerned only with the combined 
frequency of these pairs. This combined frequency, 
being a symmetrical function of the recombination 
fraction, x> and its complement i — x, may be simply 
expressed in terms of the product, 

£ = x(i— X)- 

The expected frequencies are shown in the table 
for a total of seven suitable families observed. 

In order to assess the efficacy of the classification 
in detecting linkage, we need to know the amount of 
information which it provides in the limit for loose 
linkage, when x = and £ — After calculating, 
therefore, the values of dm/di f for the four types of 
family to be distinguished, the frequencies are 
rewritten for the particular value f ar.d the 

amount of information calculated for this value in 
the last column. It is easily seen that the total 
amount of information is 80/3 for seven families, 
or the information per family is 

i = 80/2 1 . 
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TABLE 38 


Types of 
Family. 

Frequency expected. 

m. 

dmjdi. 

m. 

I /dmX 1 
m \di) 

AT At aT at 

• 

Coupling 

Repulsion 



f=i 

i = i 

O O 

O 2 

0 2 | 
O O 

x 2 

(>-x ) 2 

(1-x) 2 

x 2 

) 

-> 

i 

8 

\ ° 

O I 

0 I 

1 O 

2 x(> +x) 

2(1 x)(“ x) 

2(l x)(2 x) 
2 x('+x) 

] 4(1 -0 

-4 

3 

1 6/3 

0 O 

1 I 

I I 

O O 

2 x( 2 -x) 

2 ( 1 -X 2 ) 

2(1 -X 2 ) 
2x(2-x) 

] 2(1+25) 

+ 4 

3 

16/3 

O I 

• 

O I 

2x(J-x) 

2x0-x) 

2f 

+2 

i 

8 





7 

0 

7 

80/3 


The loss of information in Bernstein’s method 
arises from the fact that he draws no distinction 
between the types of family in the first and second 
pairs, or between the third pair and the last type of 
family. If we were to throw these together, so 
distinguishing only two groups of families, and relying 
on the relative frequencies of these two groups only 
for the detection of linkage, we should have the table 
set out below. 


TABLE 39 



m. 

dm/di. 

m. 

i = i 

I / dm \ 8 
m \ dU 

f = i 

First groui? 

S- 6 f 

-6 

3 j 

72/7 

Second group . 

2 + 6 f 

+6 

3 i 

72/7 

Totals . 

7 

O 

7 

144/7 
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The amount of information available from seven 
families, using Bernstein’s classification, is therefore 
only 144/7, place of 80/3 available when the 
families are fully classified. The fraction of the 
information utilised by Bernstein’s method is the ratio 
of these two quantities, or 27/35. This ratio is termed 
the efficiency of the method. For larger families its 
value is found to be somewhat, but not much, lower, 
the limiting value for large families being 9/16. 
There is, however, no difficulty in utilising the whole 
of the information available in the record, for families 
of any size, once the loss of information, and its cause, 
are recognised. The reader may find it instructive to 
examine in like manner the classification of families 
of three children. 

73. The Information elicited by Different 
Methods of Estimation 

The foregoing example illustrates the fact of ver.y 
general importance, that methods of estimation, 
which proceed without reference to the possibility 
of evaluating the quantity of information actually 
contained in the data, are liable to be defective in 
the quantity that they utilise. When, as is usual, 
many methods of estimation are available, it becomes 
important to be able to distinguish which use less, 
which more, and which, if any, use all. Since the 
method of measuring information, which has been 
illustrated, is applicable to data of all kinds, it is 
only necessary, in order to ascertain how much 
information is utilised by any proposed method to 
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determine the sampling distribution of the estimates 
obtained by that method from quantities of data 
of the same value as those observed. It is often 
possibje, though sometimes a matter of great mathe- 
matical difficulty, to obtain the exact sampling 
distribution of the estimate arrived at by any 
particular method, and in such cases the amount 
of information elicited by the estimate is that of a 
single observation drawn from this distribution, 
calculated exactly as in the cases illustrated above. 

In many cases in which the exact distribution of 
an estimate derived from a finite body of data is 
unknown, it is easy to show that as the sample is 
‘increased in magnitude, the sampling distribution 
tends to the normal form with a calculable variance, 
V, inversely proportional to the size of the sample, 
so that 



where v is calculable for any chosen method of 
estimation. 

We shall now show, by a direct application of the 
general method of calculating the information, to such 
a distribution of a proposed statistic, that the amount 

of information elicited by the statistic is n . 

v 

Since, in the limiting case considered, the dis- 
tribution*of the statistic becomes continuous and all 
observable values of it are distinguishable, instead of 
a summation over a number of classes, we shall be 

concerned with an integration over all the elementary 

Q 2 
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ranges, dT, in which the statistic T may be found to 
lie. T, then, is known to be distributed about the 
true value 6 of the parameter, whatever it may be, of 
which T is an estimate, in a normal distribution with 
known variance, V. The probability that it will be 
found to lie in the infinitesimal range, dT, is therefore, 


df ~ V3W* 


(T-0) a 

2V dT. 


Differentiating this with respect to 9, in order to 
ascertain how much information about 9 the value of 
T, regarded now as a single observation, provides, 
we have 



The square of this divided by dj is now seen to be 


d/, 

V 2 


and the integration of this over all values of T gives 
simply 


— = i 

V 2 V’ 


since, as is well known, the average value of (T — 6) 2 
is equal to V, V being, in fact, the mean hquare 
deviation, or variance, of the normal distribution. 

Consequently, we have found that the a'mount of 
information provided by an estimate, normally dis- 
tributed with variance V, is equal to i/V, the invar- 
iance of that normal distribution. It is thus easy to 
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test whether in the limit for large samples any proposed 
method of estimation tends to elicit the whole of the 
information supplied by the data, or a lesser amount. 
We have only to compare the quantity i/V with I, 
the amount known to be available ; or, dividing 
both of these quantities by n, to compare ijv with 
the amount of information, i, provided by each 
individual observation. The ratio of the amount 
elicited to the amount available is called the 
“ efficiency ” of the method of estimation under 
discussion, and it has been demonstrated, as the 
common sense of the method requires, that the 
efficiency can in no circumstances exceed unity. 

74. The Information lost in the Estimation 
of Error 

In the limit for large samples it is always possible 
to obtain estimates of 100 per cent, efficiency, but 
^vith small samples, when treated exactly, this is not 
found to be generally possible. In some simple cases, 
however, estimates may be made, which in them- 
selves contain the whole of the information available 
for finite samples. These especially valuable and 
comprehensive estimates are called stifficient statistics, 
and the great simplicity of the problems, which fall 
under' the head of the theory of errors, is due to the 
fact that with the normal distribution both of the 
quantity requiring estimation, the mean, and the 
variance, possess sufficient estimates. It is for this 
reason that in so much experimental work we need 
only be concerned with the precision of the total, or 
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mean, of the values observed, and with the estimation 
of this precision from the sum of the squares of the 
residual deviations. 

There is, however, one point in connection with 
experiments involving measurements, to which the 
theory of errors is applicable, which may be cleared 
up by the methods of this chapter. 

When, as the result of an experiment, a value x 
has been assigned a sampling variance, s 2 , validly 
and correctly estimated from n degrees of freedom, 
the position is not the same as if the variance were 
known with exactitude. Our estimate of the variance 
is itself subject to sampling error, and exact allowance 
for such error is made by using the true distribution' 
of t, instead of the normal distribution, when testing 
the significance of the deviation of our observed value 
from any proposed hypothetical value. In view of this 
procedure, it must be considered to be inexact to state 
the amount of information supplied by the experiment 
respecting the true value of which x is an estimate, 
merely as i/s 2 , as though our estimate were known to 
be normally distributed with this variance. We need, 
in fact, in considering the absolute precision of an 
experimental result, to take into account, not only 
the estimate .r 2 derived from the data, but also the 
number of degrees of freedom upon which our 
estimate, s 2 , was based. 

Now the probability that the quantity t, defined 
by the relationship 

x — /j. = st, 

where x is the observed value, and fx the hypothetical 
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value of which it is an estimate, shall lie in any assigned 
range, dt , is given by the formula 


n — 1 


df — 


dt 


71 — 2 


! V 


7 T7l 


(' + ») 


}(« f 1) ’ 


or in terms of x and by 
71 — 1 

df - 


I 


dx 


! S \f 7 771 


1 , 4- (ff.~ ;./*)* I * ( " M) 

l >« 2 J 


From this we can evaluate the amount of information 
supplied by an observed value, x, relative to the 
unknown parameter, /x, as we have done with the 
normal curve above, by differentiating with respect 
to /x. This gives 


«+ 1 

71 s 2 


(x — f)df _ 
(X- d ) 2 ’ 


id- 


squaring this, and dividing by df , we find 

(n+1) 2 (x — n)*df 

n 2 s* _| , (x — f) 2 1 2 


ns c 


Wheij integrated over all possible values of the 
observable quantity, x, this amounts to 

»+ 1 
(»+ 3 > 2 ’ 

It appears that the true precision of our estimate 
is somewhat lower than it would have been, had the 
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variance been -known with exactitude to be i 2 . In 
the extreme case, when n = i, and the estimate is 
based on only i degree of freedom, the precision is 
halved. And in general, the true precision is less 
than it might be thought, if the uncertainty of our 
estimate of the variance were ignored, by the fraction 
2 /(» + 3 ). It may thus be worth while to sacrifice, to 
some small extent, the aim of diminishing the value 
of 5- 2 , if this diminution carries with it any undue 
reduction in the number of degrees of freedom, avail- 
able for the estimation of error. 


C. 


R. 


R. 
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